From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id EE3A67D9ED for ; Tue, 9 Nov 2021 13:08:04 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id EC4C1CDF7 for ; Tue, 9 Nov 2021 13:08:04 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 324FACDA2 for ; Tue, 9 Nov 2021 13:08:01 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id EE5D942820 for ; Tue, 9 Nov 2021 13:08:00 +0100 (CET) From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= To: pve-devel@lists.proxmox.com Date: Tue, 9 Nov 2021 13:07:41 +0100 Message-Id: <20211109120742.3276270-4-f.gruenbichler@proxmox.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211109120742.3276270-1-f.gruenbichler@proxmox.com> References: <20211109120742.3276270-1-f.gruenbichler@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.292 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Nov 2021 12:08:05 -0000 see https://github.com/corosync/corosync/issues/660 as well. these are already queued for 1.23 and taken straight from stable1-proposed. Signed-off-by: Fabian Grünbichler --- ...eq_num-initialization-race-condition.patch | 53 +++++++++++ ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++ debian/patches/series | 3 +- 3 files changed, 147 insertions(+), 1 deletion(-) create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch diff --git a/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch new file mode 100644 index 0000000..d01e0d4 --- /dev/null +++ b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch @@ -0,0 +1,53 @@ +From 7eebe93c5039dad432bdd40101287e7fc04b3d10 Mon Sep 17 00:00:00 2001 +From: "Fabio M. Di Nitto" +Date: Mon, 8 Nov 2021 09:14:22 +0100 +Subject: [PATCH 1/2] [host] fix dst_seq_num initialization race condition +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +There is a potential race condition where the sender +is overloaded, sending data packets before pings +can kick in and set the correct dst_seq_num. + +if this node is starting up (dst_seq_num = 0), +it can start rejecing valid packets and get stuck. + +Set the dst_seq_num to the first seen packet and +use that as reference instead. + +Signed-off-by: Fabio M. Di Nitto +Signed-off-by: Fabian Grünbichler +--- + libknet/host.c | 15 +++++++++++++++ + 1 file changed, 15 insertions(+) + +diff --git a/libknet/host.c b/libknet/host.c +index ec73c0df..6fca01f8 100644 +--- a/libknet/host.c ++++ b/libknet/host.c +@@ -573,6 +573,21 @@ int _seq_num_lookup(struct knet_host *host, seq_num_t seq_num, int defrag_buf, i + char *dst_cbuf_defrag = host->circular_buffer_defrag; + seq_num_t *dst_seq_num = &host->rx_seq_num; + ++ /* ++ * There is a potential race condition where the sender ++ * is overloaded, sending data packets before pings ++ * can kick in and set the correct dst_seq_num. ++ * ++ * if this node is starting up (dst_seq_num = 0), ++ * it can start rejecing valid packets and get stuck. ++ * ++ * Set the dst_seq_num to the first seen packet and ++ * use that as reference instead. ++ */ ++ if (!*dst_seq_num) { ++ *dst_seq_num = seq_num; ++ } ++ + if (clear_buf) { + _clear_cbuffers(host, seq_num); + } +-- +2.30.2 + diff --git a/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch new file mode 100644 index 0000000..c8a9990 --- /dev/null +++ b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch @@ -0,0 +1,92 @@ +From 1d52003ae7814ebf2b47c1ac3463c7d82486a5fd Mon Sep 17 00:00:00 2001 +From: "Fabio M. Di Nitto" +Date: Sun, 7 Nov 2021 17:02:05 +0100 +Subject: [PATCH 2/2] [udp] use ICMP error messages to trigger faster link down + detection +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +this solves a possible race condition when: + +- node1 is running +- node2 very fast +- node1 does NOT have enough time to detect that node2 has gone + and reset the local seq numbers / buffers +- node1 will start rejecting valid packets from node2 + +There is still a potential minor race condition where app +can restart so fast that kernel / network don't have time +to generate an ICMP error. This will be addressed using +instance id in onwire v2 protocol, as suggested by Jan F. + +Signed-off-by: Fabio M. Di Nitto +Signed-off-by: Fabian Grünbichler +--- + libknet/transport_udp.c | 44 +++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 44 insertions(+) + +diff --git a/libknet/transport_udp.c b/libknet/transport_udp.c +index 482c99b1..a1419c89 100644 +--- a/libknet/transport_udp.c ++++ b/libknet/transport_udp.c +@@ -364,6 +364,46 @@ static int read_errs_from_sock(knet_handle_t knet_h, int sockfd) + log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s destination unknown", addr_str, strerror(sock_err->ee_errno)); + } else { + log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s %s", addr_str, strerror(sock_err->ee_errno), addr_remote_str); ++ if ((sock_err->ee_errno == ECONNREFUSED) || /* knet is not running on the other node */ ++ (sock_err->ee_errno == ECONNABORTED) || /* local kernel closed the socket */ ++ (sock_err->ee_errno == ENONET) || /* network does not exist */ ++ (sock_err->ee_errno == ENETUNREACH) || /* network unreachable */ ++ (sock_err->ee_errno == EHOSTUNREACH) || /* host unreachable */ ++ (sock_err->ee_errno == EHOSTDOWN) || /* host down (from kernel/net/ipv4/icmp.c */ ++ (sock_err->ee_errno == ENETDOWN)) { /* network down */ ++ struct knet_host *host = NULL; ++ struct knet_link *kn_link = NULL; ++ int link_idx, found = 0; ++ ++ for (host = knet_h->host_head; host != NULL; host = host->next) { ++ for (link_idx = 0; link_idx < KNET_MAX_LINK; link_idx++) { ++ kn_link = &host->link[link_idx]; ++ if (kn_link->outsock == sockfd) { ++ if (!cmpaddr(&remote, &kn_link->dst_addr)) { ++ found = 1; ++ break; ++ } ++ } ++ } ++ if (found) { ++ break; ++ } ++ } ++ ++ if ((host) && (kn_link) && ++ (kn_link->status.connected)) { ++ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Setting down host %u link %i", host->host_id, kn_link->link_id); ++ /* ++ * setting transport_connected = 0 will trigger ++ * thread_heartbeat link_down process. ++ * ++ * the process terminates calling into transport_link_down ++ * below that will set transport_connected = 1 ++ */ ++ kn_link->transport_connected = 0; ++ } ++ ++ } + } + } + break; +@@ -436,5 +476,9 @@ int udp_transport_link_dyn_connect(knet_handle_t knet_h, int sockfd, struct knet + + int udp_transport_link_is_down(knet_handle_t knet_h, struct knet_link *kn_link) + { ++ /* ++ * see comments about handling ICMP error messages ++ */ ++ kn_link->transport_connected = 1; + return 0; + } +-- +2.30.2 + diff --git a/debian/patches/series b/debian/patches/series index 8b13789..16fba19 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1 +1,2 @@ - +0001-host-fix-dst_seq_num-initialization-race-condition.patch +0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch -- 2.30.2