From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: [pve-devel] applied: [PATCH kronosnet] cherry-pick pmtud fix
Date: Wed, 16 Nov 2022 09:37:18 +0100 [thread overview]
Message-ID: <5aa23d4a-ab3c-efe4-39cf-587991c0d653@proxmox.com> (raw)
In-Reply-To: <20221110152833.226134-1-f.gruenbichler@proxmox.com>
Am 10/11/2022 um 16:28 schrieb Fabian Grünbichler:
> as reported in https://forum.proxmox.com/threads/sudden-reboot-of-multiple-nodes-while-adding-a-new-node.116714/
>
> this patch just fixes a particular issue where a node joins (as in
> quorum membership change, not limited to PVE cluster join) an existing
> cluster, but has a lower MTU than the existing links to the already
> joined part of the cluster.
>
> i.e.:
>
> Node A: MTU 9000
> Node B: MTU 9000
> Node C: MTU 1500
>
> A & B are already up and running and have established that they can talk
> to eachother with MTU 9000 (-overhead). Now C joins as well - without
> the reset and re-schedule of MTU discovery in this patch, A and B will
> use MTU 9000 when talking to C, but those packets might never arrive
> (depending on network hardware and configuration). Since the heartbeat
> packets used to detect the link status are always small, they are able
> to arrive at C without any problems. If the network along the way
> doesn't reject the packets, but just drops them, the MTU discovery is
> also severely delayed (up to tens of minutes until the actual, low MTU
> is correctly detected!).
>
> In the regular case, the reset will be immediately followed by detecting
> the correct MTU for the new link (and depending on whether its lower
> than the other links, the global MTU used for fragmenting by knet), and
> the window with additional overhead (smaller MTU => more fragmentation
> => more packets) should be fairly small. In case of a network blackhole
> negatively affecting MTU discovery, the window might be big, but without
> this patch, the result is a complete outage of the whole cluster, which
> is even less desirable than a cluster running with performance impacted.
>
> Upstream is working on further improving similar failure scenarios, such as:
> - improved handling of MTU being lowered at runtime (either at the link
> level, or somewhere along the network path)
> - improving MTU discovery timeouts and intervals to speedup recovery
> even with blackholing networks
>
> These other changes are still work in progress and will follow at a
> later date.
>
> This patch is cherry-picked from upstream branch stable1-proposed
> (slated for inclusion in the next stable 1.x release of libknet).
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> We might evaluate setting netmtu to 1500-overhead in our cluster
> creation code to avoid MTU related issues - the net benefit for setting
> up high MTU for corosync traffic is likely neglible, and almost always
> a side-effect of re-using network links also used as uplinks or storage
> links.
>
> netmtu is used by corosync to fragment its messages *before* passing
> them to knet, avoiding the need to fragment at the knet layer. There is
> also a (new, git-only at the moment) corosync.conf option for setting
> the MTU used by knet, skipping the pMTU-discovered one entirely. we
> could cherry-pick and set this option as well in case we want to default
> to "non-jumbo MTU".
>
> ...eset-restart-pmtud-when-a-node-joins.patch | 156 ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 157 insertions(+)
> create mode 100644 debian/patches/0001-pmtud-Reset-restart-pmtud-when-a-node-joins.patch
>
>
applied, thanks!
prev parent reply other threads:[~2022-11-16 8:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 15:28 [pve-devel] " Fabian Grünbichler
2022-11-16 8:37 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5aa23d4a-ab3c-efe4-39cf-587991c0d653@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox