[pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes
@ 2021-11-09 12:07 Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:07 UTC (permalink / raw)
  To: pve-devel

culmination of 4 weeks of triaging together with the respective upstream
devs and endless hours staring at corosync debug traces, this fixes the
following issues:

- knet losing join messages if network is overloaded, pushing corosync
  into a retransmit loop, potentially causing a full-cluster fence event
  with just a single node acting up
- corosync potentially corrupting messages during membership changes

and another one reported by someone else:

- corosync causing high network load by not holding the token in case
  messages are queued for retransmission

all of the fixes are taken from the respective stable queue with
releases slated for later this week.

corosync:

Fabian Grünbichler (2):
  cherry-pick fixes
  bump version to 3.1.5-pve2

 ...cel_hold_on_retransmit-config-option.patch | 132 ++++++++++++++++++
 ...ch-totempg-buffers-at-the-right-time.patch | 113 +++++++++++++++
 debian/changelog                              |   8 ++
 debian/patches/series                         |   2 +
 4 files changed, 255 insertions(+)
 create mode 100644 debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
 create mode 100644 debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch

kronosnet:

Fabian Grünbichler (2):
  fix #3672: cherry-pick knet fixes
  bump version to 1.22-pve2

 ...eq_num-initialization-race-condition.patch | 53 +++++++++++
 ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
 debian/changelog                              |  6 ++
 debian/patches/series                         |  3 +-
 4 files changed, 153 insertions(+), 1 deletion(-)
 create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
 create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch

-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes
  2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
@ 2021-11-09 12:07 ` Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2 Fabian Grünbichler
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:07 UTC (permalink / raw)
  To: pve-devel

patch #3 should improve network load in recovery situations
patch #4 fixes a cpg corruption issue discovered while investigating the
knet sequence number bug

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 ...cel_hold_on_retransmit-config-option.patch | 132 ++++++++++++++++++
 ...ch-totempg-buffers-at-the-right-time.patch | 113 +++++++++++++++
 debian/patches/series                         |   2 +
 3 files changed, 247 insertions(+)
 create mode 100644 debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
 create mode 100644 debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch

diff --git a/debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch b/debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
new file mode 100644
index 0000000..7fd66cf
--- /dev/null
+++ b/debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
@@ -0,0 +1,132 @@
+From cdf72925db5a81e546ca8e8d7d8291ee1fc77be4 Mon Sep 17 00:00:00 2001
+From: Jan Friesse <jfriesse@redhat.com>
+Date: Wed, 11 Aug 2021 17:34:05 +0200
+Subject: [PATCH 3/4] totem: Add cancel_hold_on_retransmit config option
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Previously, existence of retransmit messages canceled holding
+of token (and never allowed representative to enter token hold
+state).
+
+This makes token rotating maximum speed and keeps processor
+resending messages over and over again - overloading network
+and reducing chance to successfully deliver the messages.
+
+Also there were reports of various Antivirus / IPS / IDS which slows
+down delivery of packets with certain sizes (packets bigger than token)
+what make Corosync retransmit messages over and over again.
+
+Proposed solution is to allow representative to enter token hold
+state when there are only retransmit messages. This allows network to
+handle overload and/or gives Antivirus/IPS/IDS enough time scan and
+deliver packets without corosync entering "FAILED TO RECEIVE" state and
+adding more load to network.
+
+Signed-off-by: Jan Friesse <jfriesse@redhat.com>
+Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ include/corosync/totem/totem.h |  2 ++
+ exec/totemconfig.c             |  6 ++++++
+ exec/totemsrp.c                |  5 +++--
+ man/corosync.conf.5            | 15 ++++++++++++++-
+ 4 files changed, 25 insertions(+), 3 deletions(-)
+
+diff --git a/include/corosync/totem/totem.h b/include/corosync/totem/totem.h
+index 8b166566..bdb6a15f 100644
+--- a/include/corosync/totem/totem.h
++++ b/include/corosync/totem/totem.h
+@@ -244,6 +244,8 @@ struct totem_config {
+ 
+ 	unsigned int block_unlisted_ips;
+ 
++	unsigned int cancel_token_hold_on_retransmit;
++
+ 	void (*totem_memb_ring_id_create_or_load) (
+ 	    struct memb_ring_id *memb_ring_id,
+ 	    unsigned int nodeid);
+diff --git a/exec/totemconfig.c b/exec/totemconfig.c
+index 57a1587a..46e09952 100644
+--- a/exec/totemconfig.c
++++ b/exec/totemconfig.c
+@@ -81,6 +81,7 @@
+ #define MAX_MESSAGES				17
+ #define MISS_COUNT_CONST			5
+ #define BLOCK_UNLISTED_IPS			1
++#define CANCEL_TOKEN_HOLD_ON_RETRANSMIT		0
+ /* This constant is not used for knet */
+ #define UDP_NETMTU                              1500
+ 
+@@ -144,6 +145,8 @@ static void *totem_get_param_by_name(struct totem_config *totem_config, const ch
+ 		return totem_config->knet_compression_model;
+ 	if (strcmp(param_name, "totem.block_unlisted_ips") == 0)
+ 		return &totem_config->block_unlisted_ips;
++	if (strcmp(param_name, "totem.cancel_token_hold_on_retransmit") == 0)
++		return &totem_config->cancel_token_hold_on_retransmit;
+ 
+ 	return NULL;
+ }
+@@ -365,6 +368,9 @@ void totem_volatile_config_read (struct totem_config *totem_config, icmap_map_t
+ 
+ 	totem_volatile_config_set_boolean_value(totem_config, temp_map, "totem.block_unlisted_ips", deleted_key,
+ 	    BLOCK_UNLISTED_IPS);
++
++	totem_volatile_config_set_boolean_value(totem_config, temp_map, "totem.cancel_token_hold_on_retransmit",
++	    deleted_key, CANCEL_TOKEN_HOLD_ON_RETRANSMIT);
+ }
+ 
+ int totem_volatile_config_validate (
+diff --git a/exec/totemsrp.c b/exec/totemsrp.c
+index 949d367b..d24b11fa 100644
+--- a/exec/totemsrp.c
++++ b/exec/totemsrp.c
+@@ -3981,8 +3981,9 @@ static int message_handler_orf_token (
+ 		transmits_allowed = fcc_calculate (instance, token);
+ 		mcasted_retransmit = orf_token_rtr (instance, token, &transmits_allowed);
+ 
+-		if (instance->my_token_held == 1 &&
+-			(token->rtr_list_entries > 0 || mcasted_retransmit > 0)) {
++		if (instance->totem_config->cancel_token_hold_on_retransmit &&
++		    instance->my_token_held == 1 &&
++		    (token->rtr_list_entries > 0 || mcasted_retransmit > 0)) {
+ 			instance->my_token_held = 0;
+ 			forward_token = 1;
+ 		}
+diff --git a/man/corosync.conf.5 b/man/corosync.conf.5
+index 0588ad1e..a3771ea7 100644
+--- a/man/corosync.conf.5
++++ b/man/corosync.conf.5
+@@ -32,7 +32,7 @@
+ .\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ .\" * THE POSSIBILITY OF SUCH DAMAGE.
+ .\" */
+-.TH COROSYNC_CONF 5 2021-07-23 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
++.TH COROSYNC_CONF 5 2021-08-11 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
+ .SH NAME
+ corosync.conf - corosync executive configuration file
+ 
+@@ -584,6 +584,19 @@ with an old configuration.
+ 
+ The default value is yes.
+ 
++.TP
++cancel_token_hold_on_retransmit
++Allows Corosync to hold token by representative when there is too much
++retransmit messages. This allows network to process increased load without
++overloading it. Used mechanism is same as described for
++.B hold
++directive.
++
++Some deployments may prefer to never hold token when there is
++retransmit messages. If so, option should be set to yes.
++
++The default value is no.
++
+ .PP
+ Within the
+ .B logging
+-- 
+2.30.2
+
diff --git a/debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch b/debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
new file mode 100644
index 0000000..2ef9215
--- /dev/null
+++ b/debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
@@ -0,0 +1,113 @@
+From 7dce8bc0066c7c76eeb26cc8f6fe4de3221d6798 Mon Sep 17 00:00:00 2001
+From: Jan Friesse <jfriesse@redhat.com>
+Date: Tue, 26 Oct 2021 18:17:59 +0200
+Subject: [PATCH 4/4] totemsrp: Switch totempg buffers at the right time
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Commit 92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc added switching of
+totempg buffers in sync phase. But because buffers got switch too early
+there was a problem when delivering recovered messages (messages got
+corrupted and/or lost). Solution is to switch buffers after recovered
+messages got delivered.
+
+I think it is worth to describe complete history with reproducers so it
+doesn't get lost.
+
+It all started with 402638929e5045ef520a7339696c687fbed0b31b (more info
+about original problem is described in
+https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch
+solves problem which is way to be reproduced with following reproducer:
+- 2 nodes
+- Both nodes running corosync and testcpg
+- Pause node 1 (SIGSTOP of corosync)
+- On node 1, send some messages by testcpg
+  (it's not answering but this doesn't matter). Simply hit ENTER key
+  few times is enough)
+- Wait till node 2 detects that node 1 left
+- Unpause node 1 (SIGCONT of corosync)
+
+and on node 1 newly mcasted cpg messages got sent before sync barrier,
+so node 2 logs "Unknown node -> we will not deliver message".
+
+Solution was to add switch of totemsrp new messages buffer.
+
+This patch was not enough so new one
+(92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc) was created. Reproducer of
+problem was similar, just cpgverify was used instead of testcpg.
+Occasionally when node 1 was unpaused it hang in sync phase because
+there was a partial message in totempg buffers. New sync message had
+different frag cont so it was thrown away and never delivered.
+
+After many years problem was found which is solved by this patch
+(original issue describe in
+https://github.com/corosync/corosync/issues/660).
+Reproducer is more complex:
+- 2 nodes
+- Node 1 is rate-limited (used script on the hypervisor side):
+  ```
+  iface=tapXXXX
+  # ~0.1MB/s in bit/s
+  rate=838856
+  # 1mb/s
+  burst=1048576
+  tc qdisc add dev $iface root handle 1: htb default 1
+  tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \
+    burst ${burst}b
+  tc qdisc add dev $iface handle ffff: ingress
+  tc filter add dev $iface parent ffff: prio 50 basic police rate \
+    ${rate}bps burst ${burst}b mtu 64kb "drop"
+  ```
+- Node 2 is running corosync and cpgverify
+- Node 1 keeps restarting of corosync and running cpgverify in cycle
+  - Console 1: while true; do corosync; sleep 20; \
+      kill $(pidof corosync); sleep 20; done
+  - Console 2: while true; do ./cpgverify;done
+
+And from time to time (reproduced usually in less than 5 minutes)
+cpgverify reports corrupted message.
+
+Signed-off-by: Jan Friesse <jfriesse@redhat.com>
+Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ exec/totemsrp.c | 16 +++++++++++++++-
+ 1 file changed, 15 insertions(+), 1 deletion(-)
+
+diff --git a/exec/totemsrp.c b/exec/totemsrp.c
+index d24b11fa..fd71771b 100644
+--- a/exec/totemsrp.c
++++ b/exec/totemsrp.c
+@@ -1989,13 +1989,27 @@ static void memb_state_operational_enter (struct totemsrp_instance *instance)
+ 		trans_memb_list_totemip, instance->my_trans_memb_entries,
+ 		left_list, instance->my_left_memb_entries,
+ 		0, 0, &instance->my_ring_id);
++	/*
++	 * Switch new totemsrp messages queue. Messages sent from now on are stored
++	 * in different queue so synchronization messages are delivered first. Totempg
++	 * buffers will be switched later.
++	 */
+ 	instance->waiting_trans_ack = 1;
+-	instance->totemsrp_waiting_trans_ack_cb_fn (1);
+ 
+ // TODO we need to filter to ensure we only deliver those
+ // messages which are part of instance->my_deliver_memb
+ 	messages_deliver_to_app (instance, 1, instance->old_ring_state_high_seq_received);
+ 
++	/*
++	 * Switch totempg buffers. This used to be right after
++	 *   instance->waiting_trans_ack = 1;
++	 * line. This was causing problem, because there may be not yet
++	 * processed parts of messages in totempg buffers.
++	 * So when buffers were switched and recovered messages
++	 * got delivered it was not possible to assemble them.
++	 */
++	instance->totemsrp_waiting_trans_ack_cb_fn (1);
++
+ 	instance->my_aru = aru_save;
+ 
+ 	/*
+-- 
+2.30.2
+
diff --git a/debian/patches/series b/debian/patches/series
index fd3a2f0..74c8c39 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,2 +1,4 @@
 0001-Enable-PrivateTmp-in-the-systemd-service-files.patch
 0002-only-start-corosync.service-if-conf-exists.patch
+0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
+0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2
  2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
@ 2021-11-09 12:07 ` Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes Fabian Grünbichler
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:07 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 debian/changelog | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index 018d175..21c7a19 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+corosync (3.1.5-pve2) bullseye; urgency=medium
+
+  * cherry-pick fix for high retransmit load
+
+  * cherry-pick fix for CPG corruption during membership change bug
+
+ -- Proxmox Support Team <support@proxmox.com>  Tue, 9 Nov 2021 11:50:52 +0100
+
 corosync (3.1.5-pve1) bullseye; urgency=medium
 
   * update to v3.1.5 upstream release
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes
  2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2 Fabian Grünbichler
@ 2021-11-09 12:07 ` Fabian Grünbichler
  2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2 Fabian Grünbichler
  2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
  4 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:07 UTC (permalink / raw)
  To: pve-devel

see https://github.com/corosync/corosync/issues/660 as well. these are
already queued for 1.23 and taken straight from stable1-proposed.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 ...eq_num-initialization-race-condition.patch | 53 +++++++++++
 ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
 debian/patches/series                         |  3 +-
 3 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
 create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch

diff --git a/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
new file mode 100644
index 0000000..d01e0d4
--- /dev/null
+++ b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
@@ -0,0 +1,53 @@
+From 7eebe93c5039dad432bdd40101287e7fc04b3d10 Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Mon, 8 Nov 2021 09:14:22 +0100
+Subject: [PATCH 1/2] [host] fix dst_seq_num initialization race condition
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is a potential race condition where the sender
+is overloaded, sending data packets before pings
+can kick in and set the correct dst_seq_num.
+
+if this node is starting up (dst_seq_num = 0),
+it can start rejecing valid packets and get stuck.
+
+Set the dst_seq_num to the first seen packet and
+use that as reference instead.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/host.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+diff --git a/libknet/host.c b/libknet/host.c
+index ec73c0df..6fca01f8 100644
+--- a/libknet/host.c
++++ b/libknet/host.c
+@@ -573,6 +573,21 @@ int _seq_num_lookup(struct knet_host *host, seq_num_t seq_num, int defrag_buf, i
+ 	char *dst_cbuf_defrag = host->circular_buffer_defrag;
+ 	seq_num_t *dst_seq_num = &host->rx_seq_num;
+ 
++	/*
++	 * There is a potential race condition where the sender
++	 * is overloaded, sending data packets before pings
++	 * can kick in and set the correct dst_seq_num.
++	 *
++	 * if this node is starting up (dst_seq_num = 0),
++	 * it can start rejecing valid packets and get stuck.
++	 *
++	 * Set the dst_seq_num to the first seen packet and
++	 * use that as reference instead.
++	 */
++	if (!*dst_seq_num) {
++		*dst_seq_num = seq_num;
++	}
++
+ 	if (clear_buf) {
+ 		_clear_cbuffers(host, seq_num);
+ 	}
+-- 
+2.30.2
+
diff --git a/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
new file mode 100644
index 0000000..c8a9990
--- /dev/null
+++ b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
@@ -0,0 +1,92 @@
+From 1d52003ae7814ebf2b47c1ac3463c7d82486a5fd Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto@redhat.com>
+Date: Sun, 7 Nov 2021 17:02:05 +0100
+Subject: [PATCH 2/2] [udp] use ICMP error messages to trigger faster link down
+ detection
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+this solves a possible race condition when:
+
+- node1 is running
+- node2 very fast
+- node1 does NOT have enough time to detect that node2 has gone
+  and reset the local seq numbers / buffers
+- node1 will start rejecting valid packets from node2
+
+There is still a potential minor race condition where app
+can restart so fast that kernel / network don't have time
+to generate an ICMP error. This will be addressed using
+instance id in onwire v2 protocol, as suggested by Jan F.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
+---
+ libknet/transport_udp.c | 44 +++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 44 insertions(+)
+
+diff --git a/libknet/transport_udp.c b/libknet/transport_udp.c
+index 482c99b1..a1419c89 100644
+--- a/libknet/transport_udp.c
++++ b/libknet/transport_udp.c
+@@ -364,6 +364,46 @@ static int read_errs_from_sock(knet_handle_t knet_h, int sockfd)
+ 									log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s destination unknown", addr_str, strerror(sock_err->ee_errno));
+ 								} else {
+ 									log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s %s", addr_str, strerror(sock_err->ee_errno), addr_remote_str);
++									if ((sock_err->ee_errno == ECONNREFUSED) || /* knet is not running on the other node */
++									    (sock_err->ee_errno == ECONNABORTED) || /* local kernel closed the socket */
++									    (sock_err->ee_errno == ENONET)       || /* network does not exist */
++									    (sock_err->ee_errno == ENETUNREACH)  || /* network unreachable */
++									    (sock_err->ee_errno == EHOSTUNREACH) || /* host unreachable */
++									    (sock_err->ee_errno == EHOSTDOWN)    || /* host down (from kernel/net/ipv4/icmp.c */
++									    (sock_err->ee_errno == ENETDOWN)) {     /* network down */
++										struct knet_host *host = NULL;
++										struct knet_link *kn_link = NULL;
++										int link_idx, found = 0;
++
++										for (host = knet_h->host_head; host != NULL; host = host->next) {
++											for (link_idx = 0; link_idx < KNET_MAX_LINK; link_idx++) {
++												kn_link = &host->link[link_idx];
++												if (kn_link->outsock == sockfd) {
++													if (!cmpaddr(&remote, &kn_link->dst_addr)) {
++														found = 1;
++														break;
++													}
++												}
++											}
++											if (found) {
++												break;
++											}
++										}
++
++										if ((host) && (kn_link) &&
++										    (kn_link->status.connected)) {
++											log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Setting down host %u link %i", host->host_id, kn_link->link_id);
++											/*
++											 * setting transport_connected = 0 will trigger
++											 * thread_heartbeat link_down process.
++											 *
++											 * the process terminates calling into transport_link_down
++											 * below that will set transport_connected = 1
++											 */
++											kn_link->transport_connected = 0;
++										}
++
++									}
+ 								}
+ 							}
+ 							break;
+@@ -436,5 +476,9 @@ int udp_transport_link_dyn_connect(knet_handle_t knet_h, int sockfd, struct knet
+ 
+ int udp_transport_link_is_down(knet_handle_t knet_h, struct knet_link *kn_link)
+ {
++	/*
++	 * see comments about handling ICMP error messages
++	 */
++	kn_link->transport_connected = 1;
+ 	return 0;
+ }
+-- 
+2.30.2
+
diff --git a/debian/patches/series b/debian/patches/series
index 8b13789..16fba19 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1 +1,2 @@
-
+0001-host-fix-dst_seq_num-initialization-race-condition.patch
+0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2
  2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
                   ` (2 preceding siblings ...)
  2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes Fabian Grünbichler
@ 2021-11-09 12:07 ` Fabian Grünbichler
  2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
  4 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:07 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 debian/changelog | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index b154415..2ef406a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+kronosnet (1.22-pve2) bullseye; urgency=medium
+
+  * cherry pick fixes for membership change under high network load
+
+ -- Proxmox Support Team <support@proxmox.com>  Tue, 9 Nov 2021 11:44:52 +0100
+
 kronosnet (1.22-pve1) bullseye; urgency=medium
 
   * update to v1.22 upstream release
-- 
2.30.2





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes
  2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
                   ` (3 preceding siblings ...)
  2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2 Fabian Grünbichler
@ 2021-11-09 12:31 ` Thomas Lamprecht
  2021-11-09 12:54   ` [pve-devel] applied-series: " Fabian Grünbichler
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Lamprecht @ 2021-11-09 12:31 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 09.11.21 13:07, Fabian Grünbichler wrote:
> culmination of 4 weeks of triaging together with the respective upstream
> devs and endless hours staring at corosync debug traces, this fixes the
> following issues:
> 
> - knet losing join messages if network is overloaded, pushing corosync
>   into a retransmit loop, potentially causing a full-cluster fence event
>   with just a single node acting up
> - corosync potentially corrupting messages during membership changes
> 
> and another one reported by someone else:
> 
> - corosync causing high network load by not holding the token in case
>   messages are queued for retransmission
> 
> all of the fixes are taken from the respective stable queue with
> releases slated for later this week.
> 
> corosync:
> 
> Fabian Grünbichler (2):
>   cherry-pick fixes
>   bump version to 3.1.5-pve2
> 
>  ...cel_hold_on_retransmit-config-option.patch | 132 ++++++++++++++++++
>  ...ch-totempg-buffers-at-the-right-time.patch | 113 +++++++++++++++
>  debian/changelog                              |   8 ++
>  debian/patches/series                         |   2 +
>  4 files changed, 255 insertions(+)
>  create mode 100644 debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
>  create mode 100644 debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
> 
> kronosnet:
> 
> Fabian Grünbichler (2):
>   fix #3672: cherry-pick knet fixes
>   bump version to 1.22-pve2
> 
>  ...eq_num-initialization-race-condition.patch | 53 +++++++++++
>  ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
>  debian/changelog                              |  6 ++
>  debian/patches/series                         |  3 +-
>  4 files changed, 153 insertions(+), 1 deletion(-)
>  create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
>  create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
> 

For all of this:

Acked-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>

Can you go a head and push + upload packages?




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pve-devel] applied-series: [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes
  2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
@ 2021-11-09 12:54   ` Fabian Grünbichler
  2021-11-09 13:21     ` Eneko Lacunza
  0 siblings, 1 reply; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 12:54 UTC (permalink / raw)
  To: Proxmox VE development discussion, Thomas Lamprecht

On November 9, 2021 1:31 pm, Thomas Lamprecht wrote:
> On 09.11.21 13:07, Fabian Grünbichler wrote:
>> culmination of 4 weeks of triaging together with the respective upstream
>> devs and endless hours staring at corosync debug traces, this fixes the
>> following issues:
>> 
>> - knet losing join messages if network is overloaded, pushing corosync
>>   into a retransmit loop, potentially causing a full-cluster fence event
>>   with just a single node acting up
>> - corosync potentially corrupting messages during membership changes
>> 
>> and another one reported by someone else:
>> 
>> - corosync causing high network load by not holding the token in case
>>   messages are queued for retransmission
>> 
>> all of the fixes are taken from the respective stable queue with
>> releases slated for later this week.
>> 
>> corosync:
>> 
>> Fabian Grünbichler (2):
>>   cherry-pick fixes
>>   bump version to 3.1.5-pve2
>> 
>>  ...cel_hold_on_retransmit-config-option.patch | 132 ++++++++++++++++++
>>  ...ch-totempg-buffers-at-the-right-time.patch | 113 +++++++++++++++
>>  debian/changelog                              |   8 ++
>>  debian/patches/series                         |   2 +
>>  4 files changed, 255 insertions(+)
>>  create mode 100644 debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
>>  create mode 100644 debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
>> 
>> kronosnet:
>> 
>> Fabian Grünbichler (2):
>>   fix #3672: cherry-pick knet fixes
>>   bump version to 1.22-pve2
>> 
>>  ...eq_num-initialization-race-condition.patch | 53 +++++++++++
>>  ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
>>  debian/changelog                              |  6 ++
>>  debian/patches/series                         |  3 +-
>>  4 files changed, 153 insertions(+), 1 deletion(-)
>>  create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
>>  create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
>> 
> 
> For all of this:
> 
> Acked-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> 
> Can you go a head and push + upload packages?
> 

done




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] applied-series: [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes
  2021-11-09 12:54   ` [pve-devel] applied-series: " Fabian Grünbichler
@ 2021-11-09 13:21     ` Eneko Lacunza
  2021-11-09 17:39       ` Fabian Grünbichler
  0 siblings, 1 reply; 9+ messages in thread
From: Eneko Lacunza @ 2021-11-09 13:21 UTC (permalink / raw)
  To: pve-devel

Hi,

Nice to see this here, I think we have been afected by this for the past 
weeks (since we upgraded to PVE 7...); I was starting to think we had 
faulty network :)

How can I know when this gets to community repo?

Thanks

El 9/11/21 a las 13:54, Fabian Grünbichler escribió:
> On November 9, 2021 1:31 pm, Thomas Lamprecht wrote:
>> On 09.11.21 13:07, Fabian Grünbichler wrote:
>>> culmination of 4 weeks of triaging together with the respective upstream
>>> devs and endless hours staring at corosync debug traces, this fixes the
>>> following issues:
>>>
>>> - knet losing join messages if network is overloaded, pushing corosync
>>>    into a retransmit loop, potentially causing a full-cluster fence event
>>>    with just a single node acting up
>>> - corosync potentially corrupting messages during membership changes
>>>
>>> and another one reported by someone else:
>>>
>>> - corosync causing high network load by not holding the token in case
>>>    messages are queued for retransmission
>>>
>>> all of the fixes are taken from the respective stable queue with
>>> releases slated for later this week.
>>>
>>> corosync:
>>>
>>> Fabian Grünbichler (2):
>>>    cherry-pick fixes
>>>    bump version to 3.1.5-pve2
>>>
>>>   ...cel_hold_on_retransmit-config-option.patch | 132 ++++++++++++++++++
>>>   ...ch-totempg-buffers-at-the-right-time.patch | 113 +++++++++++++++
>>>   debian/changelog                              |   8 ++
>>>   debian/patches/series                         |   2 +
>>>   4 files changed, 255 insertions(+)
>>>   create mode 100644 debian/patches/0003-totem-Add-cancel_hold_on_retransmit-config-option.patch
>>>   create mode 100644 debian/patches/0004-totemsrp-Switch-totempg-buffers-at-the-right-time.patch
>>>
>>> kronosnet:
>>>
>>> Fabian Grünbichler (2):
>>>    fix #3672: cherry-pick knet fixes
>>>    bump version to 1.22-pve2
>>>
>>>   ...eq_num-initialization-race-condition.patch | 53 +++++++++++
>>>   ...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
>>>   debian/changelog                              |  6 ++
>>>   debian/patches/series                         |  3 +-
>>>   4 files changed, 153 insertions(+), 1 deletion(-)
>>>   create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
>>>   create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
>>>
>> For all of this:
>>
>> Acked-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
>>
>> Can you go a head and push + upload packages?
>>
> done
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pve-devel] applied-series: [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes
  2021-11-09 13:21     ` Eneko Lacunza
@ 2021-11-09 17:39       ` Fabian Grünbichler
  0 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2021-11-09 17:39 UTC (permalink / raw)
  To: Proxmox VE development discussion

On November 9, 2021 2:21 pm, Eneko Lacunza wrote:
> Hi,
> 
> Nice to see this here, I think we have been afected by this for the past 
> weeks (since we upgraded to PVE 7...); I was starting to think we had 
> faulty network :)
> 
> How can I know when this gets to community repo?
> 
> Thanks

it's available on pvetest already




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-09 17:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-09 12:07 [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 1/2] cherry-pick fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH corosync-pve 2/2] bump version to 3.1.5-pve2 Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes Fabian Grünbichler
2021-11-09 12:07 ` [pve-devel] [PATCH kronosnet 2/2] bump version to 1.22-pve2 Fabian Grünbichler
2021-11-09 12:31 ` [pve-devel] [PATCH corosync-pve/kronosnet 0/4] cherry-pick bug fixes Thomas Lamprecht
2021-11-09 12:54   ` [pve-devel] applied-series: " Fabian Grünbichler
2021-11-09 13:21     ` Eneko Lacunza
2021-11-09 17:39       ` Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal