From: Alexandre DERUMIER <aderumier@odiso.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Wed, 30 Sep 2020 08:09:15 +0200 (CEST) [thread overview]
Message-ID: <1030723475.2207793.1601446155192.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <2038102437.1994930.1601387538948.JavaMail.zimbra@odiso.com>
Hi,
some news, my last test is running for 14h now, and I don't have had any problem :)
So, it seem that is indeed fixed ! Congratulations !
I wonder if it could be related to this forum user
https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/
His problem is that after corosync lag (he's have 1 cluster stretch on 2DC with 10km distance, so I think sometimes he's having some small lag,
1 node is flooding other nodes with a lot of udp packets. (and making things worst, as corosync cpu is going to 100% / overloaded, and then can't see other onodes
I had this problem 6month ago after shutting down a node, that's why I'm thinking it could "maybe" related.
So, I wonder if it could be same pmxcfs bug, when something looping or send again again packets.
The forum user seem to have the problem multiple times in some week, so maybe he'll be able to test the new fixed pmxcs, and tell us if it's fixing this bug too.
----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Envoyé: Mardi 29 Septembre 2020 15:52:18
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>>huge thanks for all the work on this btw!
huge thanks to you ! ;)
>>I think I've found a likely culprit (a missing lock around a
>>non-thread-safe corosync library call) based on the last logs (which
>>were now finally complete!).
YES :)
>>if feedback from your end is positive, I'll whip up a proper patch
>>tomorrow or on Thursday.
I'm going to launch a new test right now !
----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Envoyé: Mardi 29 Septembre 2020 15:28:19
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
huge thanks for all the work on this btw!
I think I've found a likely culprit (a missing lock around a
non-thread-safe corosync library call) based on the last logs (which
were now finally complete!).
rebuilt packages with a proof-of-concept-fix:
23b03a48d3aa9c14e86fe8cf9bbb7b00bd8fe9483084b9e0fd75fd67f29f10bec00e317e2a66758713050f36c165d72f107ee3449f9efeb842d3a57c25f8bca7 pve-cluster_6.1-8_amd64.deb
9e1addd676513b176f5afb67cc6d85630e7da9bbbf63562421b4fd2a3916b3b2af922df555059b99f8b0b9e64171101a1c9973846e25f9144ded9d487450baef pve-cluster-dbgsym_6.1-8_amd64.deb
I removed some logging statements which are no longer needed, so output
is a bit less verbose again. if you are not able to trigger the issue
with this package, feel free to remove the -debug and let it run for a
little longer without the massive logs.
if feedback from your end is positive, I'll whip up a proper patch
tomorrow or on Thursday.
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2020-09-30 6:09 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42 ` Dietmar Maurer
2020-09-05 13:32 ` Alexandre DERUMIER
2020-09-05 15:23 ` dietmar
2020-09-05 17:30 ` Alexandre DERUMIER
2020-09-06 4:21 ` dietmar
2020-09-06 5:36 ` Alexandre DERUMIER
2020-09-06 6:33 ` Alexandre DERUMIER
2020-09-06 8:43 ` Alexandre DERUMIER
2020-09-06 12:14 ` dietmar
2020-09-06 12:19 ` dietmar
2020-09-07 7:00 ` Thomas Lamprecht
2020-09-07 7:19 ` Alexandre DERUMIER
2020-09-07 8:18 ` dietmar
2020-09-07 9:32 ` Alexandre DERUMIER
2020-09-07 13:23 ` Alexandre DERUMIER
2020-09-08 4:41 ` dietmar
2020-09-08 7:11 ` Alexandre DERUMIER
2020-09-09 20:05 ` Thomas Lamprecht
2020-09-10 4:58 ` Alexandre DERUMIER
2020-09-10 8:21 ` Thomas Lamprecht
2020-09-10 11:34 ` Alexandre DERUMIER
2020-09-10 18:21 ` Thomas Lamprecht
2020-09-14 4:54 ` Alexandre DERUMIER
2020-09-14 7:14 ` Dietmar Maurer
2020-09-14 8:27 ` Alexandre DERUMIER
2020-09-14 8:51 ` Thomas Lamprecht
2020-09-14 15:45 ` Alexandre DERUMIER
2020-09-15 5:45 ` dietmar
2020-09-15 6:27 ` Alexandre DERUMIER
2020-09-15 7:13 ` dietmar
2020-09-15 8:42 ` Alexandre DERUMIER
2020-09-15 9:35 ` Alexandre DERUMIER
2020-09-15 9:46 ` Thomas Lamprecht
2020-09-15 10:15 ` Alexandre DERUMIER
2020-09-15 11:04 ` Alexandre DERUMIER
2020-09-15 12:49 ` Alexandre DERUMIER
2020-09-15 13:00 ` Thomas Lamprecht
2020-09-15 14:09 ` Alexandre DERUMIER
2020-09-15 14:19 ` Alexandre DERUMIER
2020-09-15 14:32 ` Thomas Lamprecht
2020-09-15 14:57 ` Alexandre DERUMIER
2020-09-15 15:58 ` Alexandre DERUMIER
2020-09-16 7:34 ` Alexandre DERUMIER
2020-09-16 7:58 ` Alexandre DERUMIER
2020-09-16 8:30 ` Alexandre DERUMIER
2020-09-16 8:53 ` Alexandre DERUMIER
[not found] ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15 ` Alexandre DERUMIER
2020-09-16 14:45 ` Thomas Lamprecht
2020-09-16 15:17 ` Alexandre DERUMIER
2020-09-17 9:21 ` Fabian Grünbichler
2020-09-17 9:59 ` Alexandre DERUMIER
2020-09-17 10:02 ` Alexandre DERUMIER
2020-09-17 11:35 ` Thomas Lamprecht
2020-09-20 23:54 ` Alexandre DERUMIER
2020-09-22 5:43 ` Alexandre DERUMIER
2020-09-24 14:02 ` Fabian Grünbichler
2020-09-24 14:29 ` Alexandre DERUMIER
2020-09-24 18:07 ` Alexandre DERUMIER
2020-09-25 6:44 ` Alexandre DERUMIER
2020-09-25 7:15 ` Alexandre DERUMIER
2020-09-25 9:19 ` Fabian Grünbichler
2020-09-25 9:46 ` Alexandre DERUMIER
2020-09-25 12:51 ` Fabian Grünbichler
2020-09-25 16:29 ` Alexandre DERUMIER
2020-09-28 9:17 ` Fabian Grünbichler
2020-09-28 9:35 ` Alexandre DERUMIER
2020-09-28 15:59 ` Alexandre DERUMIER
2020-09-29 5:30 ` Alexandre DERUMIER
2020-09-29 8:51 ` Fabian Grünbichler
2020-09-29 9:37 ` Alexandre DERUMIER
2020-09-29 10:52 ` Alexandre DERUMIER
2020-09-29 11:43 ` Alexandre DERUMIER
2020-09-29 11:50 ` Alexandre DERUMIER
2020-09-29 13:28 ` Fabian Grünbichler
2020-09-29 13:52 ` Alexandre DERUMIER
2020-09-30 6:09 ` Alexandre DERUMIER [this message]
2020-09-30 6:26 ` Thomas Lamprecht
2020-09-15 7:58 ` Thomas Lamprecht
2020-12-29 14:21 ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15 9:16 ` Eneko Lacunza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1030723475.2207793.1601446155192.JavaMail.zimbra@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox