From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id C590F880B for ; Wed, 16 Nov 2022 09:37:20 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A73671BEC9 for ; Wed, 16 Nov 2022 09:37:20 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 16 Nov 2022 09:37:19 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6456C4483F for ; Wed, 16 Nov 2022 09:37:19 +0100 (CET) Message-ID: <5aa23d4a-ab3c-efe4-39cf-587991c0d653@proxmox.com> Date: Wed, 16 Nov 2022 09:37:18 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Thunderbird/107.0 Content-Language: en-GB To: Proxmox VE development discussion , =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= References: <20221110152833.226134-1-f.gruenbichler@proxmox.com> From: Thomas Lamprecht In-Reply-To: <20221110152833.226134-1-f.gruenbichler@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL -0.032 Adjusted score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict =?UTF-8?Q?Alignment=0A=09?=SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF =?UTF-8?Q?record=0A=09?=URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: [pve-devel] applied: [PATCH kronosnet] cherry-pick pmtud fix X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Nov 2022 08:37:20 -0000 Am 10/11/2022 um 16:28 schrieb Fabian Gr=C3=BCnbichler: > as reported in https://forum.proxmox.com/threads/sudden-reboot-of-multi= ple-nodes-while-adding-a-new-node.116714/ >=20 > this patch just fixes a particular issue where a node joins (as in > quorum membership change, not limited to PVE cluster join) an existing > cluster, but has a lower MTU than the existing links to the already > joined part of the cluster. >=20 > i.e.: >=20 > Node A: MTU 9000 > Node B: MTU 9000 > Node C: MTU 1500 >=20 > A & B are already up and running and have established that they can tal= k > to eachother with MTU 9000 (-overhead). Now C joins as well - without > the reset and re-schedule of MTU discovery in this patch, A and B will > use MTU 9000 when talking to C, but those packets might never arrive > (depending on network hardware and configuration). Since the heartbeat > packets used to detect the link status are always small, they are able > to arrive at C without any problems. If the network along the way > doesn't reject the packets, but just drops them, the MTU discovery is > also severely delayed (up to tens of minutes until the actual, low MTU > is correctly detected!). >=20 > In the regular case, the reset will be immediately followed by detectin= g > the correct MTU for the new link (and depending on whether its lower > than the other links, the global MTU used for fragmenting by knet), and= > the window with additional overhead (smaller MTU =3D> more fragmentatio= n > =3D> more packets) should be fairly small. In case of a network blackho= le > negatively affecting MTU discovery, the window might be big, but withou= t > this patch, the result is a complete outage of the whole cluster, which= > is even less desirable than a cluster running with performance impacted= =2E >=20 > Upstream is working on further improving similar failure scenarios, suc= h as: > - improved handling of MTU being lowered at runtime (either at the link= > level, or somewhere along the network path) > - improving MTU discovery timeouts and intervals to speedup recovery > even with blackholing networks >=20 > These other changes are still work in progress and will follow at a > later date. >=20 > This patch is cherry-picked from upstream branch stable1-proposed > (slated for inclusion in the next stable 1.x release of libknet). >=20 > Signed-off-by: Fabian Gr=C3=BCnbichler > --- > We might evaluate setting netmtu to 1500-overhead in our cluster > creation code to avoid MTU related issues - the net benefit for setting= > up high MTU for corosync traffic is likely neglible, and almost always > a side-effect of re-using network links also used as uplinks or storage= > links. >=20 > netmtu is used by corosync to fragment its messages *before* passing > them to knet, avoiding the need to fragment at the knet layer. There is= > also a (new, git-only at the moment) corosync.conf option for setting > the MTU used by knet, skipping the pMTU-discovered one entirely. we > could cherry-pick and set this option as well in case we want to defaul= t > to "non-jumbo MTU". >=20 > ...eset-restart-pmtud-when-a-node-joins.patch | 156 ++++++++++++++++++= > debian/patches/series | 1 + > 2 files changed, 157 insertions(+) > create mode 100644 debian/patches/0001-pmtud-Reset-restart-pmtud-when-= a-node-joins.patch >=20 > applied, thanks!