all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexandre DERUMIER <aderumier@odiso.com>
To: dietmar <dietmar@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Mon, 7 Sep 2020 11:32:13 +0200 (CEST)	[thread overview]
Message-ID: <1066029576.414316.1599471133463.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <72727125.827.1599466723564@webmail.proxmox.com>

>>https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111 
>>
>>No HA involved... 

I had already help this user some week ago

https://forum.proxmox.com/threads/proxmox-6-2-4-cluster-die-node-auto-reboot-need-help.74643/#post-333093

HA was actived at this time. (Maybe the watchdog was still running, I'm not sure if you disable HA from all vms if LRM disable the watchdog ?)


----- Mail original -----
De: "dietmar" <dietmar@proxmox.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Envoyé: Lundi 7 Septembre 2020 10:18:42
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

There is a similar report in the forum: 

https://forum.proxmox.com/threads/cluster-die-after-adding-the-39th-node-proxmox-is-not-stable.75506/#post-336111 

No HA involved... 


> On 09/07/2020 9:19 AM Alexandre DERUMIER <aderumier@odiso.com> wrote: 
> 
> 
> >>Indeed, this should not happen. Do you use a spearate network for corosync? 
> 
> No, I use 2x40GB lacp link. 
> 
> >>was there high traffic on the network? 
> 
> but I'm far from saturated them. (in pps or througput), (I'm around 3-4gbps) 
> 
> 
> The cluster is 14 nodes, with around 1000vms (with ha enabled on all vms) 
> 
> 
> From my understanding, watchdog-mux was still runing as the watchdog have reset only after 1min and not 10s, 
> so it's like the lrm was blocked and not sending watchdog timer reset to watchdog-mux. 
> 
> 
> I'll do tests with softdog + soft_noboot=1, so if that happen again,I'll able to debug. 
> 
> 
> 
> >>What kind of maintenance was the reason for the shutdown? 
> 
> ram upgrade. (the server was running ok before shutdown, no hardware problem) 
> (I just shutdown the server, and don't have started it yet when problem occur) 
> 
> 
> 
> >>Do you use the default corosync timeout values, or do you have a special setup? 
> 
> 
> no special tuning, default values. (I don't have any retransmit since months in the logs) 
> 
> >>Can you please post the full corosync config? 
> 
> (I have verified, the running version was corosync was 3.0.3 with libknet 1.15) 
> 
> 
> here the config: 
> 
> " 
> logging { 
> debug: off 
> to_syslog: yes 
> } 
> 
> nodelist { 
> node { 
> name: m6kvm1 
> nodeid: 1 
> quorum_votes: 1 
> ring0_addr: m6kvm1 
> } 
> node { 
> name: m6kvm10 
> nodeid: 10 
> quorum_votes: 1 
> ring0_addr: m6kvm10 
> } 
> node { 
> name: m6kvm11 
> nodeid: 11 
> quorum_votes: 1 
> ring0_addr: m6kvm11 
> } 
> node { 
> name: m6kvm12 
> nodeid: 12 
> quorum_votes: 1 
> ring0_addr: m6kvm12 
> } 
> node { 
> name: m6kvm13 
> nodeid: 13 
> quorum_votes: 1 
> ring0_addr: m6kvm13 
> } 
> node { 
> name: m6kvm14 
> nodeid: 14 
> quorum_votes: 1 
> ring0_addr: m6kvm14 
> } 
> node { 
> name: m6kvm2 
> nodeid: 2 
> quorum_votes: 1 
> ring0_addr: m6kvm2 
> } 
> node { 
> name: m6kvm3 
> nodeid: 3 
> quorum_votes: 1 
> ring0_addr: m6kvm3 
> } 
> node { 
> name: m6kvm4 
> nodeid: 4 
> quorum_votes: 1 
> ring0_addr: m6kvm4 
> } 
> node { 
> name: m6kvm5 
> nodeid: 5 
> quorum_votes: 1 
> ring0_addr: m6kvm5 
> } 
> node { 
> name: m6kvm6 
> nodeid: 6 
> quorum_votes: 1 
> ring0_addr: m6kvm6 
> } 
> node { 
> name: m6kvm7 
> nodeid: 7 
> quorum_votes: 1 
> ring0_addr: m6kvm7 
> } 
> 
> node { 
> name: m6kvm8 
> nodeid: 8 
> quorum_votes: 1 
> ring0_addr: m6kvm8 
> } 
> node { 
> name: m6kvm9 
> nodeid: 9 
> quorum_votes: 1 
> ring0_addr: m6kvm9 
> } 
> } 
> 
> quorum { 
> provider: corosync_votequorum 
> } 
> 
> totem { 
> cluster_name: m6kvm 
> config_version: 19 
> interface { 
> bindnetaddr: 10.3.94.89 
> ringnumber: 0 
> } 
> ip_version: ipv4 
> secauth: on 
> transport: knet 
> version: 2 
> } 
> 
> 
> 
> ----- Mail original ----- 
> De: "dietmar" <dietmar@proxmox.com> 
> À: "aderumier" <aderumier@odiso.com>, "Proxmox VE development discussion" <pve-devel@lists.proxmox.com> 
> Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
> Envoyé: Dimanche 6 Septembre 2020 14:14:06 
> Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown 
> 
> > Sep 3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds) 
> > Sep 3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds) 
> 
> Indeed, this should not happen. Do you use a spearate network for corosync? Or 
> was there high traffic on the network? What kind of maintenance was the reason 
> for the shutdown? 




  reply	other threads:[~2020-09-07  9:32 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42   ` Dietmar Maurer
2020-09-05 13:32     ` Alexandre DERUMIER
2020-09-05 15:23       ` dietmar
2020-09-05 17:30         ` Alexandre DERUMIER
2020-09-06  4:21           ` dietmar
2020-09-06  5:36             ` Alexandre DERUMIER
2020-09-06  6:33               ` Alexandre DERUMIER
2020-09-06  8:43               ` Alexandre DERUMIER
2020-09-06 12:14                 ` dietmar
2020-09-06 12:19                   ` dietmar
2020-09-07  7:00                     ` Thomas Lamprecht
2020-09-07  7:19                   ` Alexandre DERUMIER
2020-09-07  8:18                     ` dietmar
2020-09-07  9:32                       ` Alexandre DERUMIER [this message]
2020-09-07 13:23                         ` Alexandre DERUMIER
2020-09-08  4:41                           ` dietmar
2020-09-08  7:11                             ` Alexandre DERUMIER
2020-09-09 20:05                               ` Thomas Lamprecht
2020-09-10  4:58                                 ` Alexandre DERUMIER
2020-09-10  8:21                                   ` Thomas Lamprecht
2020-09-10 11:34                                     ` Alexandre DERUMIER
2020-09-10 18:21                                       ` Thomas Lamprecht
2020-09-14  4:54                                         ` Alexandre DERUMIER
2020-09-14  7:14                                           ` Dietmar Maurer
2020-09-14  8:27                                             ` Alexandre DERUMIER
2020-09-14  8:51                                               ` Thomas Lamprecht
2020-09-14 15:45                                                 ` Alexandre DERUMIER
2020-09-15  5:45                                                   ` dietmar
2020-09-15  6:27                                                     ` Alexandre DERUMIER
2020-09-15  7:13                                                       ` dietmar
2020-09-15  8:42                                                         ` Alexandre DERUMIER
2020-09-15  9:35                                                           ` Alexandre DERUMIER
2020-09-15  9:46                                                             ` Thomas Lamprecht
2020-09-15 10:15                                                               ` Alexandre DERUMIER
2020-09-15 11:04                                                                 ` Alexandre DERUMIER
2020-09-15 12:49                                                                   ` Alexandre DERUMIER
2020-09-15 13:00                                                                     ` Thomas Lamprecht
2020-09-15 14:09                                                                       ` Alexandre DERUMIER
2020-09-15 14:19                                                                         ` Alexandre DERUMIER
2020-09-15 14:32                                                                         ` Thomas Lamprecht
2020-09-15 14:57                                                                           ` Alexandre DERUMIER
2020-09-15 15:58                                                                             ` Alexandre DERUMIER
2020-09-16  7:34                                                                               ` Alexandre DERUMIER
2020-09-16  7:58                                                                                 ` Alexandre DERUMIER
2020-09-16  8:30                                                                                   ` Alexandre DERUMIER
2020-09-16  8:53                                                                                     ` Alexandre DERUMIER
     [not found]                                                                                     ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15                                                                                       ` Alexandre DERUMIER
2020-09-16 14:45                                                                                         ` Thomas Lamprecht
2020-09-16 15:17                                                                                           ` Alexandre DERUMIER
2020-09-17  9:21                                                                                             ` Fabian Grünbichler
2020-09-17  9:59                                                                                               ` Alexandre DERUMIER
2020-09-17 10:02                                                                                                 ` Alexandre DERUMIER
2020-09-17 11:35                                                                                                   ` Thomas Lamprecht
2020-09-20 23:54                                                                                                     ` Alexandre DERUMIER
2020-09-22  5:43                                                                                                       ` Alexandre DERUMIER
2020-09-24 14:02                                                                                                         ` Fabian Grünbichler
2020-09-24 14:29                                                                                                           ` Alexandre DERUMIER
2020-09-24 18:07                                                                                                             ` Alexandre DERUMIER
2020-09-25  6:44                                                                                                               ` Alexandre DERUMIER
2020-09-25  7:15                                                                                                                 ` Alexandre DERUMIER
2020-09-25  9:19                                                                                                                   ` Fabian Grünbichler
2020-09-25  9:46                                                                                                                     ` Alexandre DERUMIER
2020-09-25 12:51                                                                                                                       ` Fabian Grünbichler
2020-09-25 16:29                                                                                                                         ` Alexandre DERUMIER
2020-09-28  9:17                                                                                                                           ` Fabian Grünbichler
2020-09-28  9:35                                                                                                                             ` Alexandre DERUMIER
2020-09-28 15:59                                                                                                                               ` Alexandre DERUMIER
2020-09-29  5:30                                                                                                                                 ` Alexandre DERUMIER
2020-09-29  8:51                                                                                                                                 ` Fabian Grünbichler
2020-09-29  9:37                                                                                                                                   ` Alexandre DERUMIER
2020-09-29 10:52                                                                                                                                     ` Alexandre DERUMIER
2020-09-29 11:43                                                                                                                                       ` Alexandre DERUMIER
2020-09-29 11:50                                                                                                                                         ` Alexandre DERUMIER
2020-09-29 13:28                                                                                                                                           ` Fabian Grünbichler
2020-09-29 13:52                                                                                                                                             ` Alexandre DERUMIER
2020-09-30  6:09                                                                                                                                               ` Alexandre DERUMIER
2020-09-30  6:26                                                                                                                                                 ` Thomas Lamprecht
2020-09-15  7:58                                                       ` Thomas Lamprecht
2020-12-29 14:21   ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15  9:16   ` Eneko Lacunza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1066029576.414316.1599471133463.JavaMail.zimbra@odiso.com \
    --to=aderumier@odiso.com \
    --cc=dietmar@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal