From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes
Date: Tue, 9 Nov 2021 13:07:41 +0100 [thread overview]
Message-ID: <20211109120742.3276270-4-f.gruenbichler@proxmox.com> (raw)
In-Reply-To: <20211109120742.3276270-1-f.gruenbichler@proxmox.com>
see https://github.com/corosync/corosync/issues/660 as well. these are
already queued for 1.23 and taken straight from stable1-proposed.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
...eq_num-initialization-race-condition.patch | 53 +++++++++++
...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
debian/patches/series | 3 +-
3 files changed, 147 insertions(+), 1 deletion(-)
create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
diff --git a/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
new file mode 100644
index 0000000..d01e0d4
--- /dev/null
+++ b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
@@ -0,0 +1,53 @@
+From 7eebe93c5039dad432bdd40101287e7fc04b3d10 Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Mon, 8 Nov 2021 09:14:22 +0100
+Subject: [PATCH 1/2] [host] fix dst_seq_num initialization race condition
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is a potential race condition where the sender
+is overloaded, sending data packets before pings
+can kick in and set the correct dst_seq_num.
+
+if this node is starting up (dst_seq_num = 0),
+it can start rejecing valid packets and get stuck.
+
+Set the dst_seq_num to the first seen packet and
+use that as reference instead.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/host.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+diff --git a/libknet/host.c b/libknet/host.c
+index ec73c0df..6fca01f8 100644
+--- a/libknet/host.c
++++ b/libknet/host.c
+@@ -573,6 +573,21 @@ int _seq_num_lookup(struct knet_host *host, seq_num_t seq_num, int defrag_buf, i
+ char *dst_cbuf_defrag = host->circular_buffer_defrag;
+ seq_num_t *dst_seq_num = &host->rx_seq_num;
+
++ /*
++ * There is a potential race condition where the sender
++ * is overloaded, sending data packets before pings
++ * can kick in and set the correct dst_seq_num.
++ *
++ * if this node is starting up (dst_seq_num = 0),
++ * it can start rejecing valid packets and get stuck.
++ *
++ * Set the dst_seq_num to the first seen packet and
++ * use that as reference instead.
++ */
++ if (!*dst_seq_num) {
++ *dst_seq_num = seq_num;
++ }
++
+ if (clear_buf) {
+ _clear_cbuffers(host, seq_num);
+ }
+--
+2.30.2
+
diff --git a/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
new file mode 100644
index 0000000..c8a9990
--- /dev/null
+++ b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
@@ -0,0 +1,92 @@
+From 1d52003ae7814ebf2b47c1ac3463c7d82486a5fd Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Sun, 7 Nov 2021 17:02:05 +0100
+Subject: [PATCH 2/2] [udp] use ICMP error messages to trigger faster link down
+ detection
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+this solves a possible race condition when:
+
+- node1 is running
+- node2 very fast
+- node1 does NOT have enough time to detect that node2 has gone
+ and reset the local seq numbers / buffers
+- node1 will start rejecting valid packets from node2
+
+There is still a potential minor race condition where app
+can restart so fast that kernel / network don't have time
+to generate an ICMP error. This will be addressed using
+instance id in onwire v2 protocol, as suggested by Jan F.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/transport_udp.c | 44 +++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 44 insertions(+)
+
+diff --git a/libknet/transport_udp.c b/libknet/transport_udp.c
+index 482c99b1..a1419c89 100644
+--- a/libknet/transport_udp.c
++++ b/libknet/transport_udp.c
+@@ -364,6 +364,46 @@ static int read_errs_from_sock(knet_handle_t knet_h, int sockfd)
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s destination unknown", addr_str, strerror(sock_err->ee_errno));
+ } else {
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s %s", addr_str, strerror(sock_err->ee_errno), addr_remote_str);
++ if ((sock_err->ee_errno == ECONNREFUSED) || /* knet is not running on the other node */
++ (sock_err->ee_errno == ECONNABORTED) || /* local kernel closed the socket */
++ (sock_err->ee_errno == ENONET) || /* network does not exist */
++ (sock_err->ee_errno == ENETUNREACH) || /* network unreachable */
++ (sock_err->ee_errno == EHOSTUNREACH) || /* host unreachable */
++ (sock_err->ee_errno == EHOSTDOWN) || /* host down (from kernel/net/ipv4/icmp.c */
++ (sock_err->ee_errno == ENETDOWN)) { /* network down */
++ struct knet_host *host = NULL;
++ struct knet_link *kn_link = NULL;
++ int link_idx, found = 0;
++
++ for (host = knet_h->host_head; host != NULL; host = host->next) {
++ for (link_idx = 0; link_idx < KNET_MAX_LINK; link_idx++) {
++ kn_link = &host->link[link_idx];
++ if (kn_link->outsock == sockfd) {
++ if (!cmpaddr(&remote, &kn_link->dst_addr)) {
++ found = 1;
++ break;
++ }
++ }
++ }
++ if (found) {
++ break;
++ }
++ }
++
++ if ((host) && (kn_link) &&
++ (kn_link->status.connected)) {
++ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Setting down host %u link %i", host->host_id, kn_link->link_id);
++ /*
++ * setting transport_connected = 0 will trigger
++ * thread_heartbeat link_down process.
++ *
++ * the process terminates calling into transport_link_down
++ * below that will set transport_connected = 1
++ */
++ kn_link->transport_connected = 0;
++ }
++
++ }
+ }
+ }
+ break;
+@@ -436,5 +476,9 @@ int udp_transport_link_dyn_connect(knet_handle_t knet_h, int sockfd, struct knet
+
+ int udp_transport_link_is_down(knet_handle_t knet_h, struct knet_link *kn_link)
+ {
++ /*
++ * see comments about handling ICMP error messages
++ */
++ kn_link->transport_connected = 1;
+ return 0;
+ }
+--
+2.30.2
+
diff --git a/debian/patches/series b/debian/patches/series
index 8b13789..16fba19 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1 +1,2 @@
-
+0001-host-fix-dst_seq_num-initialization-race-condition.patch
+0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
--
2.30.2
next prev parent reply other threads:[~2021-11-09 12:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2 Fabian Grünbichler
2021-11-09 12:07 ` Fabian Grünbichler [this message]
2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2 Fabian Grünbichler
2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
2021-11-09 12:54 ` [pve-devel] applied-series: " Fabian Grünbichler
2021-11-09 13:21 ` Eneko Lacunza
2021-11-09 17:39 ` Fabian Grünbichler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211109120742.3276270-4-f.gruenbichler@proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal