From: dietmar <dietmar@proxmox.com>
To: Alexandre DERUMIER <aderumier@odiso.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Mon, 7 Sep 2020 10:18:42 +0200 (CEST) [thread overview]
Message-ID: <72727125.827.1599466723564@webmail.proxmox.com> (raw)
In-Reply-To: <1661182651.406890.1599463180810.JavaMail.zimbra@odiso.com>
There is a similar report in the forum:
https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111
No HA involved...
> On 09/07/2020 9:19 AM Alexandre DERUMIER <aderumier@odiso.com> wrote:
>
>
> >>Indeed, this should not happen. Do you use a spearate network for corosync?
>
> No, I use 2x40GB lacp link.
>
> >>was there high traffic on the network?
>
> but I'm far from saturated them. (in pps or througput), (I'm around 3-4gbps)
>
>
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms)
>
>
> From my understanding, watchdog-mux was still runing as the watchdog have reset only after 1min and not 10s,
> so it's like the lrm was blocked and not sending watchdog timer reset to watchdog-mux.
>
>
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able to debug.
>
>
>
> >>What kind of maintenance was the reason for the shutdown?
>
> ram upgrade. (the server was running ok before shutdown, no hardware problem)
> (I just shutdown the server, and don't have started it yet when problem occur)
>
>
>
> >>Do you use the default corosync timeout values, or do you have a special setup?
>
>
> no special tuning, default values. (I don't have any retransmit since months in the logs)
>
> >>Can you please post the full corosync config?
>
> (I have verified, the running version was corosync was 3.0.3 with libknet 1.15)
>
>
> here the config:
>
> "
> logging {
> debug: off
> to_syslog: yes
> }
>
> nodelist {
> node {
> name: m6kvm1
> nodeid: 1
> quorum_votes: 1
> ring0_addr: m6kvm1
> }
> node {
> name: m6kvm10
> nodeid: 10
> quorum_votes: 1
> ring0_addr: m6kvm10
> }
> node {
> name: m6kvm11
> nodeid: 11
> quorum_votes: 1
> ring0_addr: m6kvm11
> }
> node {
> name: m6kvm12
> nodeid: 12
> quorum_votes: 1
> ring0_addr: m6kvm12
> }
> node {
> name: m6kvm13
> nodeid: 13
> quorum_votes: 1
> ring0_addr: m6kvm13
> }
> node {
> name: m6kvm14
> nodeid: 14
> quorum_votes: 1
> ring0_addr: m6kvm14
> }
> node {
> name: m6kvm2
> nodeid: 2
> quorum_votes: 1
> ring0_addr: m6kvm2
> }
> node {
> name: m6kvm3
> nodeid: 3
> quorum_votes: 1
> ring0_addr: m6kvm3
> }
> node {
> name: m6kvm4
> nodeid: 4
> quorum_votes: 1
> ring0_addr: m6kvm4
> }
> node {
> name: m6kvm5
> nodeid: 5
> quorum_votes: 1
> ring0_addr: m6kvm5
> }
> node {
> name: m6kvm6
> nodeid: 6
> quorum_votes: 1
> ring0_addr: m6kvm6
> }
> node {
> name: m6kvm7
> nodeid: 7
> quorum_votes: 1
> ring0_addr: m6kvm7
> }
>
> node {
> name: m6kvm8
> nodeid: 8
> quorum_votes: 1
> ring0_addr: m6kvm8
> }
> node {
> name: m6kvm9
> nodeid: 9
> quorum_votes: 1
> ring0_addr: m6kvm9
> }
> }
>
> quorum {
> provider: corosync_votequorum
> }
>
> totem {
> cluster_name: m6kvm
> config_version: 19
> interface {
> bindnetaddr: 10.3.94.89
> ringnumber: 0
> }
> ip_version: ipv4
> secauth: on
> transport: knet
> version: 2
> }
>
>
>
> ----- Mail original -----
> De: "dietmar" <dietmar@proxmox.com>
> À: "aderumier" <aderumier@odiso.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
> Cc: "pve-devel" <pve-devel@pve.proxmox.com>
> Envoyé: Dimanche 6 Septembre 2020 14:14:06
> Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>
> > Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds)
> > Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds)
>
> Indeed, this should not happen. Do you use a spearate network for corosync? Or
> was there high traffic on the network? What kind of maintenance was the reason
> for the shutdown?
next prev parent reply other threads:[~2020-09-07 8:19 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42 ` Dietmar Maurer
2020-09-05 13:32 ` Alexandre DERUMIER
2020-09-05 15:23 ` dietmar
2020-09-05 17:30 ` Alexandre DERUMIER
2020-09-06 4:21 ` dietmar
2020-09-06 5:36 ` Alexandre DERUMIER
2020-09-06 6:33 ` Alexandre DERUMIER
2020-09-06 8:43 ` Alexandre DERUMIER
2020-09-06 12:14 ` dietmar
2020-09-06 12:19 ` dietmar
2020-09-07 7:00 ` Thomas Lamprecht
2020-09-07 7:19 ` Alexandre DERUMIER
2020-09-07 8:18 ` dietmar [this message]
2020-09-07 9:32 ` Alexandre DERUMIER
2020-09-07 13:23 ` Alexandre DERUMIER
2020-09-08 4:41 ` dietmar
2020-09-08 7:11 ` Alexandre DERUMIER
2020-09-09 20:05 ` Thomas Lamprecht
2020-09-10 4:58 ` Alexandre DERUMIER
2020-09-10 8:21 ` Thomas Lamprecht
2020-09-10 11:34 ` Alexandre DERUMIER
2020-09-10 18:21 ` Thomas Lamprecht
2020-09-14 4:54 ` Alexandre DERUMIER
2020-09-14 7:14 ` Dietmar Maurer
2020-09-14 8:27 ` Alexandre DERUMIER
2020-09-14 8:51 ` Thomas Lamprecht
2020-09-14 15:45 ` Alexandre DERUMIER
2020-09-15 5:45 ` dietmar
2020-09-15 6:27 ` Alexandre DERUMIER
2020-09-15 7:13 ` dietmar
2020-09-15 8:42 ` Alexandre DERUMIER
2020-09-15 9:35 ` Alexandre DERUMIER
2020-09-15 9:46 ` Thomas Lamprecht
2020-09-15 10:15 ` Alexandre DERUMIER
2020-09-15 11:04 ` Alexandre DERUMIER
2020-09-15 12:49 ` Alexandre DERUMIER
2020-09-15 13:00 ` Thomas Lamprecht
2020-09-15 14:09 ` Alexandre DERUMIER
2020-09-15 14:19 ` Alexandre DERUMIER
2020-09-15 14:32 ` Thomas Lamprecht
2020-09-15 14:57 ` Alexandre DERUMIER
2020-09-15 15:58 ` Alexandre DERUMIER
2020-09-16 7:34 ` Alexandre DERUMIER
2020-09-16 7:58 ` Alexandre DERUMIER
2020-09-16 8:30 ` Alexandre DERUMIER
2020-09-16 8:53 ` Alexandre DERUMIER
[not found] ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15 ` Alexandre DERUMIER
2020-09-16 14:45 ` Thomas Lamprecht
2020-09-16 15:17 ` Alexandre DERUMIER
2020-09-17 9:21 ` Fabian Grünbichler
2020-09-17 9:59 ` Alexandre DERUMIER
2020-09-17 10:02 ` Alexandre DERUMIER
2020-09-17 11:35 ` Thomas Lamprecht
2020-09-20 23:54 ` Alexandre DERUMIER
2020-09-22 5:43 ` Alexandre DERUMIER
2020-09-24 14:02 ` Fabian Grünbichler
2020-09-24 14:29 ` Alexandre DERUMIER
2020-09-24 18:07 ` Alexandre DERUMIER
2020-09-25 6:44 ` Alexandre DERUMIER
2020-09-25 7:15 ` Alexandre DERUMIER
2020-09-25 9:19 ` Fabian Grünbichler
2020-09-25 9:46 ` Alexandre DERUMIER
2020-09-25 12:51 ` Fabian Grünbichler
2020-09-25 16:29 ` Alexandre DERUMIER
2020-09-28 9:17 ` Fabian Grünbichler
2020-09-28 9:35 ` Alexandre DERUMIER
2020-09-28 15:59 ` Alexandre DERUMIER
2020-09-29 5:30 ` Alexandre DERUMIER
2020-09-29 8:51 ` Fabian Grünbichler
2020-09-29 9:37 ` Alexandre DERUMIER
2020-09-29 10:52 ` Alexandre DERUMIER
2020-09-29 11:43 ` Alexandre DERUMIER
2020-09-29 11:50 ` Alexandre DERUMIER
2020-09-29 13:28 ` Fabian Grünbichler
2020-09-29 13:52 ` Alexandre DERUMIER
2020-09-30 6:09 ` Alexandre DERUMIER
2020-09-30 6:26 ` Thomas Lamprecht
2020-09-15 7:58 ` Thomas Lamprecht
2020-12-29 14:21 ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15 9:16 ` Eneko Lacunza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=72727125.827.1599466723564@webmail.proxmox.com \
--to=dietmar@proxmox.com \
--cc=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox