public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: dietmar <dietmar@proxmox.com>
To: Alexandre DERUMIER <aderumier@odiso.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Mon, 7 Sep 2020 10:18:42 +0200 (CEST)	[thread overview]
Message-ID: <72727125.827.1599466723564@webmail.proxmox.com> (raw)
In-Reply-To: <1661182651.406890.1599463180810.JavaMail.zimbra@odiso.com>

There is a similar report in the forum:

https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111

No HA involved...


> On 09/07/2020 9:19 AM Alexandre DERUMIER <aderumier@odiso.com> wrote:
> 
>  
> >>Indeed, this should not happen. Do you use a spearate network for corosync? 
> 
> No, I use 2x40GB lacp link. 
> 
> >>was there high traffic on the network? 
> 
> but I'm far from saturated them. (in pps or througput),  (I'm around 3-4gbps)
> 
> 
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms)
> 
> 
> From my understanding, watchdog-mux was still runing as the watchdog have reset only after 1min and not 10s,
>  so it's like the lrm was blocked and not sending watchdog timer reset to watchdog-mux.
> 
> 
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able to debug.
> 
> 
> 
> >>What kind of maintenance was the reason for the shutdown?
> 
> ram upgrade. (the server was running ok before shutdown, no hardware problem)  
> (I just shutdown the server, and don't have started it yet when problem occur)
> 
> 
> 
> >>Do you use the default corosync timeout values, or do you have a special setup?
> 
> 
> no special tuning, default values. (I don't have any retransmit since months in the logs)
> 
> >>Can you please post the full corosync config?
> 
> (I have verified, the running version was corosync was 3.0.3 with libknet 1.15)
> 
> 
> here the config:
> 
> "
> logging {
>   debug: off
>   to_syslog: yes
> }
> 
> nodelist {
>   node {
>     name: m6kvm1
>     nodeid: 1
>     quorum_votes: 1
>     ring0_addr: m6kvm1
>   }
>   node {
>     name: m6kvm10
>     nodeid: 10
>     quorum_votes: 1
>     ring0_addr: m6kvm10
>   }
>   node {
>     name: m6kvm11
>     nodeid: 11
>     quorum_votes: 1
>     ring0_addr: m6kvm11
>   }
>   node {
>     name: m6kvm12
>     nodeid: 12
>     quorum_votes: 1
>     ring0_addr: m6kvm12
>   }
>   node {
>     name: m6kvm13
>     nodeid: 13
>     quorum_votes: 1
>     ring0_addr: m6kvm13
>   }
>   node {
>     name: m6kvm14
>     nodeid: 14
>     quorum_votes: 1
>     ring0_addr: m6kvm14
>   }
>   node {
>     name: m6kvm2
>     nodeid: 2
>     quorum_votes: 1
>     ring0_addr: m6kvm2
>   }
>   node {
>     name: m6kvm3
>     nodeid: 3
>     quorum_votes: 1
>     ring0_addr: m6kvm3
>   }
>   node {
>     name: m6kvm4
>     nodeid: 4
>     quorum_votes: 1
>     ring0_addr: m6kvm4
>   }
>   node {
>     name: m6kvm5
>     nodeid: 5
>     quorum_votes: 1
>     ring0_addr: m6kvm5
>   }
>   node {
>     name: m6kvm6
>     nodeid: 6
>     quorum_votes: 1
>     ring0_addr: m6kvm6
>   }
>   node {
>     name: m6kvm7
>     nodeid: 7
>     quorum_votes: 1
>     ring0_addr: m6kvm7
>   }
> 
>   node {
>     name: m6kvm8
>     nodeid: 8
>     quorum_votes: 1
>     ring0_addr: m6kvm8
>   }
>   node {
>     name: m6kvm9
>     nodeid: 9
>     quorum_votes: 1
>     ring0_addr: m6kvm9
>   }
> }
> 
> quorum {
>   provider: corosync_votequorum
> }
> 
> totem {
>   cluster_name: m6kvm
>   config_version: 19
>   interface {
>     bindnetaddr: 10.3.94.89
>     ringnumber: 0
>   }
>   ip_version: ipv4
>   secauth: on
>   transport: knet
>   version: 2
> }
> 
> 
> 
> ----- Mail original -----
> De: "dietmar" <dietmar@proxmox.com>
> À: "aderumier" <aderumier@odiso.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
> Cc: "pve-devel" <pve-devel@pve.proxmox.com>
> Envoyé: Dimanche 6 Septembre 2020 14:14:06
> Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
> 
> > Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) 
> > Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) 
> 
> Indeed, this should not happen. Do you use a spearate network for corosync? Or 
> was there high traffic on the network? What kind of maintenance was the reason 
> for the shutdown?




  reply	other threads:[~2020-09-07  8:19 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42   ` Dietmar Maurer
2020-09-05 13:32     ` Alexandre DERUMIER
2020-09-05 15:23       ` dietmar
2020-09-05 17:30         ` Alexandre DERUMIER
2020-09-06  4:21           ` dietmar
2020-09-06  5:36             ` Alexandre DERUMIER
2020-09-06  6:33               ` Alexandre DERUMIER
2020-09-06  8:43               ` Alexandre DERUMIER
2020-09-06 12:14                 ` dietmar
2020-09-06 12:19                   ` dietmar
2020-09-07  7:00                     ` Thomas Lamprecht
2020-09-07  7:19                   ` Alexandre DERUMIER
2020-09-07  8:18                     ` dietmar [this message]
2020-09-07  9:32                       ` Alexandre DERUMIER
2020-09-07 13:23                         ` Alexandre DERUMIER
2020-09-08  4:41                           ` dietmar
2020-09-08  7:11                             ` Alexandre DERUMIER
2020-09-09 20:05                               ` Thomas Lamprecht
2020-09-10  4:58                                 ` Alexandre DERUMIER
2020-09-10  8:21                                   ` Thomas Lamprecht
2020-09-10 11:34                                     ` Alexandre DERUMIER
2020-09-10 18:21                                       ` Thomas Lamprecht
2020-09-14  4:54                                         ` Alexandre DERUMIER
2020-09-14  7:14                                           ` Dietmar Maurer
2020-09-14  8:27                                             ` Alexandre DERUMIER
2020-09-14  8:51                                               ` Thomas Lamprecht
2020-09-14 15:45                                                 ` Alexandre DERUMIER
2020-09-15  5:45                                                   ` dietmar
2020-09-15  6:27                                                     ` Alexandre DERUMIER
2020-09-15  7:13                                                       ` dietmar
2020-09-15  8:42                                                         ` Alexandre DERUMIER
2020-09-15  9:35                                                           ` Alexandre DERUMIER
2020-09-15  9:46                                                             ` Thomas Lamprecht
2020-09-15 10:15                                                               ` Alexandre DERUMIER
2020-09-15 11:04                                                                 ` Alexandre DERUMIER
2020-09-15 12:49                                                                   ` Alexandre DERUMIER
2020-09-15 13:00                                                                     ` Thomas Lamprecht
2020-09-15 14:09                                                                       ` Alexandre DERUMIER
2020-09-15 14:19                                                                         ` Alexandre DERUMIER
2020-09-15 14:32                                                                         ` Thomas Lamprecht
2020-09-15 14:57                                                                           ` Alexandre DERUMIER
2020-09-15 15:58                                                                             ` Alexandre DERUMIER
2020-09-16  7:34                                                                               ` Alexandre DERUMIER
2020-09-16  7:58                                                                                 ` Alexandre DERUMIER
2020-09-16  8:30                                                                                   ` Alexandre DERUMIER
2020-09-16  8:53                                                                                     ` Alexandre DERUMIER
     [not found]                                                                                     ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15                                                                                       ` Alexandre DERUMIER
2020-09-16 14:45                                                                                         ` Thomas Lamprecht
2020-09-16 15:17                                                                                           ` Alexandre DERUMIER
2020-09-17  9:21                                                                                             ` Fabian Grünbichler
2020-09-17  9:59                                                                                               ` Alexandre DERUMIER
2020-09-17 10:02                                                                                                 ` Alexandre DERUMIER
2020-09-17 11:35                                                                                                   ` Thomas Lamprecht
2020-09-20 23:54                                                                                                     ` Alexandre DERUMIER
2020-09-22  5:43                                                                                                       ` Alexandre DERUMIER
2020-09-24 14:02                                                                                                         ` Fabian Grünbichler
2020-09-24 14:29                                                                                                           ` Alexandre DERUMIER
2020-09-24 18:07                                                                                                             ` Alexandre DERUMIER
2020-09-25  6:44                                                                                                               ` Alexandre DERUMIER
2020-09-25  7:15                                                                                                                 ` Alexandre DERUMIER
2020-09-25  9:19                                                                                                                   ` Fabian Grünbichler
2020-09-25  9:46                                                                                                                     ` Alexandre DERUMIER
2020-09-25 12:51                                                                                                                       ` Fabian Grünbichler
2020-09-25 16:29                                                                                                                         ` Alexandre DERUMIER
2020-09-28  9:17                                                                                                                           ` Fabian Grünbichler
2020-09-28  9:35                                                                                                                             ` Alexandre DERUMIER
2020-09-28 15:59                                                                                                                               ` Alexandre DERUMIER
2020-09-29  5:30                                                                                                                                 ` Alexandre DERUMIER
2020-09-29  8:51                                                                                                                                 ` Fabian Grünbichler
2020-09-29  9:37                                                                                                                                   ` Alexandre DERUMIER
2020-09-29 10:52                                                                                                                                     ` Alexandre DERUMIER
2020-09-29 11:43                                                                                                                                       ` Alexandre DERUMIER
2020-09-29 11:50                                                                                                                                         ` Alexandre DERUMIER
2020-09-29 13:28                                                                                                                                           ` Fabian Grünbichler
2020-09-29 13:52                                                                                                                                             ` Alexandre DERUMIER
2020-09-30  6:09                                                                                                                                               ` Alexandre DERUMIER
2020-09-30  6:26                                                                                                                                                 ` Thomas Lamprecht
2020-09-15  7:58                                                       ` Thomas Lamprecht
2020-12-29 14:21   ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15  9:16   ` Eneko Lacunza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72727125.827.1599466723564@webmail.proxmox.com \
    --to=dietmar@proxmox.com \
    --cc=aderumier@odiso.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal