From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes
Date: Tue, 9 Nov 2021 13:07:41 +0100 [thread overview]
Message-ID: <20211109120742.3276270-4-f.gruenbichler@proxmox.com> (raw)
In-Reply-To: <20211109120742.3276270-1-f.gruenbichler@proxmox.com>
see https://github.com/corosync/corosync/issues/660 as well. these are
already queued for 1.23 and taken straight from stable1-proposed.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
...eq_num-initialization-race-condition.patch | 53 +++++++++++
...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
debian/patches/series | 3 +-
3 files changed, 147 insertions(+), 1 deletion(-)
create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
diff --git a/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
new file mode 100644
index 0000000..d01e0d4
--- /dev/null
+++ b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
@@ -0,0 +1,53 @@
+From 7eebe93c5039dad432bdd40101287e7fc04b3d10 Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Mon, 8 Nov 2021 09:14:22 +0100
+Subject: [PATCH 1/2] [host] fix dst_seq_num initialization race condition
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is a potential race condition where the sender
+is overloaded, sending data packets before pings
+can kick in and set the correct dst_seq_num.
+
+if this node is starting up (dst_seq_num = 0),
+it can start rejecing valid packets and get stuck.
+
+Set the dst_seq_num to the first seen packet and
+use that as reference instead.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/host.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+diff --git a/libknet/host.c b/libknet/host.c
+index ec73c0df..6fca01f8 100644
+--- a/libknet/host.c
++++ b/libknet/host.c
+@@ -573,6 +573,21 @@ int _seq_num_lookup(struct knet_host *host, seq_num_t seq_num, int defrag_buf, i
+ char *dst_cbuf_defrag = host->circular_buffer_defrag;
+ seq_num_t *dst_seq_num = &host->rx_seq_num;
+
++ /*
++ * There is a potential race condition where the sender
++ * is overloaded, sending data packets before pings
++ * can kick in and set the correct dst_seq_num.
++ *
++ * if this node is starting up (dst_seq_num = 0),
++ * it can start rejecing valid packets and get stuck.
++ *
++ * Set the dst_seq_num to the first seen packet and
++ * use that as reference instead.
++ */
++ if (!*dst_seq_num) {
++ *dst_seq_num = seq_num;
++ }
++
+ if (clear_buf) {
+ _clear_cbuffers(host, seq_num);
+ }
+--
+2.30.2
+
diff --git a/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
new file mode 100644
index 0000000..c8a9990
--- /dev/null
+++ b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
@@ -0,0 +1,92 @@
+From 1d52003ae7814ebf2b47c1ac3463c7d82486a5fd Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Sun, 7 Nov 2021 17:02:05 +0100
+Subject: [PATCH 2/2] [udp] use ICMP error messages to trigger faster link down
+ detection
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+this solves a possible race condition when:
+
+- node1 is running
+- node2 very fast
+- node1 does NOT have enough time to detect that node2 has gone
+ and reset the local seq numbers / buffers
+- node1 will start rejecting valid packets from node2
+
+There is still a potential minor race condition where app
+can restart so fast that kernel / network don't have time
+to generate an ICMP error. This will be addressed using
+instance id in onwire v2 protocol, as suggested by Jan F.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/transport_udp.c | 44 +++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 44 insertions(+)
+
+diff --git a/libknet/transport_udp.c b/libknet/transport_udp.c
+index 482c99b1..a1419c89 100644
+--- a/libknet/transport_udp.c
++++ b/libknet/transport_udp.c
+@@ -364,6 +364,46 @@ static int read_errs_from_sock(knet_handle_t knet_h, int sockfd)
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s destination unknown", addr_str, strerror(sock_err->ee_errno));
+ } else {
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s %s", addr_str, strerror(sock_err->ee_errno), addr_remote_str);
++ if ((sock_err->ee_errno == ECONNREFUSED) || /* knet is not running on the other node */
++ (sock_err->ee_errno == ECONNABORTED) || /* local kernel closed the socket */
++ (sock_err->ee_errno == ENONET) || /* network does not exist */
++ (sock_err->ee_errno == ENETUNREACH) || /* network unreachable */
++ (sock_err->ee_errno == EHOSTUNREACH) || /* host unreachable */
++ (sock_err->ee_errno == EHOSTDOWN) || /* host down (from kernel/net/ipv4/icmp.c */
++ (sock_err->ee_errno == ENETDOWN)) { /* network down */
++ struct knet_host *host = NULL;
++ struct knet_link *kn_link = NULL;
++ int link_idx, found = 0;
++
++ for (host = knet_h->host_head; host != NULL; host = host->next) {
++ for (link_idx = 0; link_idx < KNET_MAX_LINK; link_idx++) {
++ kn_link = &host->link[link_idx];
++ if (kn_link->outsock == sockfd) {
++ if (!cmpaddr(&remote, &kn_link->dst_addr)) {
++ found = 1;
++ break;
++ }
++ }
++ }
++ if (found) {
++ break;
++ }
++ }
++
++ if ((host) && (kn_link) &&
++ (kn_link->status.connected)) {
++ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Setting down host %u link %i", host->host_id, kn_link->link_id);
++ /*
++ * setting transport_connected = 0 will trigger
++ * thread_heartbeat link_down process.
++ *
++ * the process terminates calling into transport_link_down
++ * below that will set transport_connected = 1
++ */
++ kn_link->transport_connected = 0;
++ }
++
++ }
+ }
+ }
+ break;
+@@ -436,5 +476,9 @@ int udp_transport_link_dyn_connect(knet_handle_t knet_h, int sockfd, struct knet
+
+ int udp_transport_link_is_down(knet_handle_t knet_h, struct knet_link *kn_link)
+ {
++ /*
++ * see comments about handling ICMP error messages
++ */
++ kn_link->transport_connected = 1;
+ return 0;
+ }
+--
+2.30.2
+
diff --git a/debian/patches/series b/debian/patches/series
index 8b13789..16fba19 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1 +1,2 @@
-
+0001-host-fix-dst_seq_num-initialization-race-condition.patch
+0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
--
2.30.2
next prev parent reply other threads:[~2021-11-09 12:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2 Fabian Grünbichler
2021-11-09 12:07 ` Fabian Grünbichler [this message]
2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2 Fabian Grünbichler
2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
2021-11-09 12:54 ` [pve-devel] applied-series: " Fabian Grünbichler
2021-11-09 13:21 ` Eneko Lacunza
2021-11-09 17:39 ` Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211109120742.3276270-4-f.gruenbichler@proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox