From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: [pve-devel] applied: [PATCH kronosnet] cherry-pick pmtud fix
Date: Wed, 16 Nov 2022 09:37:18 +0100 [thread overview]
Message-ID: <5aa23d4a-ab3c-efe4-39cf-587991c0d653@proxmox.com> (raw)
In-Reply-To: <20221110152833.226134-1-f.gruenbichler@proxmox.com>
Am 10/11/2022 um 16:28 schrieb Fabian Grünbichler:
> as reported in https://forum.proxmox.com/threads/sudden-reboot-of-multiple-nodes-while-adding-a-new-node.116714/
>
> this patch just fixes a particular issue where a node joins (as in
> quorum membership change, not limited to PVE cluster join) an existing
> cluster, but has a lower MTU than the existing links to the already
> joined part of the cluster.
>
> i.e.:
>
> Node A: MTU 9000
> Node B: MTU 9000
> Node C: MTU 1500
>
> A & B are already up and running and have established that they can talk
> to eachother with MTU 9000 (-overhead). Now C joins as well - without
> the reset and re-schedule of MTU discovery in this patch, A and B will
> use MTU 9000 when talking to C, but those packets might never arrive
> (depending on network hardware and configuration). Since the heartbeat
> packets used to detect the link status are always small, they are able
> to arrive at C without any problems. If the network along the way
> doesn't reject the packets, but just drops them, the MTU discovery is
> also severely delayed (up to tens of minutes until the actual, low MTU
> is correctly detected!).
>
> In the regular case, the reset will be immediately followed by detecting
> the correct MTU for the new link (and depending on whether its lower
> than the other links, the global MTU used for fragmenting by knet), and
> the window with additional overhead (smaller MTU => more fragmentation
> => more packets) should be fairly small. In case of a network blackhole
> negatively affecting MTU discovery, the window might be big, but without
> this patch, the result is a complete outage of the whole cluster, which
> is even less desirable than a cluster running with performance impacted.
>
> Upstream is working on further improving similar failure scenarios, such as:
> - improved handling of MTU being lowered at runtime (either at the link
> level, or somewhere along the network path)
> - improving MTU discovery timeouts and intervals to speedup recovery
> even with blackholing networks
>
> These other changes are still work in progress and will follow at a
> later date.
>
> This patch is cherry-picked from upstream branch stable1-proposed
> (slated for inclusion in the next stable 1.x release of libknet).
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> We might evaluate setting netmtu to 1500-overhead in our cluster
> creation code to avoid MTU related issues - the net benefit for setting
> up high MTU for corosync traffic is likely neglible, and almost always
> a side-effect of re-using network links also used as uplinks or storage
> links.
>
> netmtu is used by corosync to fragment its messages *before* passing
> them to knet, avoiding the need to fragment at the knet layer. There is
> also a (new, git-only at the moment) corosync.conf option for setting
> the MTU used by knet, skipping the pMTU-discovered one entirely. we
> could cherry-pick and set this option as well in case we want to default
> to "non-jumbo MTU".
>
> ...eset-restart-pmtud-when-a-node-joins.patch | 156 ++++++++++++++++++
> debian/patches/series | 1 +
> 2 files changed, 157 insertions(+)
> create mode 100644 debian/patches/0001-pmtud-Reset-restart-pmtud-when-a-node-joins.patch
>
>
applied, thanks!
prev parent reply other threads:[~2022-11-16 8:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 15:28 [pve-devel] " Fabian Grünbichler
2022-11-16 8:37 ` Thomas Lamprecht [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5aa23d4a-ab3c-efe4-39cf-587991c0d653@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.