public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexandre DERUMIER <aderumier@odiso.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: dietmar <dietmar@proxmox.com>, pve-devel <pve-devel@pve.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Sun, 6 Sep 2020 10:43:36 +0200 (CEST)	[thread overview]
Message-ID: <1059698258.392627.1599381816979.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <570223166.391607.1599370570342.JavaMail.zimbra@odiso.com>

Maybe something interesting, the only survived node was node7, and it was the crm master

I'm also seein crm disabling watchdog, and also some "loop take too long" messages



(some migration logs from node2 to node1 before the maintenance)
Sep  3 10:36:29 m6kvm7 pve-ha-crm[16196]: service 'vm:992': state changed from 'migrate' to 'started'  (node = m6kvm1)
Sep  3 10:36:29 m6kvm7 pve-ha-crm[16196]: service 'vm:993': state changed from 'migrate' to 'started'  (node = m6kvm1)
Sep  3 10:36:29 m6kvm7 pve-ha-crm[16196]: service 'vm:997': state changed from 'migrate' to 'started'  (node = m6kvm1)
....

Sep  3 10:40:41 m6kvm7 pve-ha-crm[16196]: node 'm6kvm2': state changed from 'online' => 'unknown'
Sep  3 10:40:50 m6kvm7 pve-ha-crm[16196]: got unexpected error - error during cfs-locked 'domain-ha' operation: no quorum!
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: lost lock 'ha_agent_m6kvm7_lock - cfs lock update failed - Permission denied
Sep  3 10:40:56 m6kvm7 pve-ha-lrm[16140]: status change active => lost_agent_lock
Sep  3 10:40:56 m6kvm7 pve-ha-crm[16196]: status change master => lost_manager_lock
Sep  3 10:40:56 m6kvm7 pve-ha-crm[16196]: watchdog closed (disabled)
Sep  3 10:40:56 m6kvm7 pve-ha-crm[16196]: status change lost_manager_lock => wait_for_quorum



others nodes timing
--------------------

10:39:16 ->  node2 shutdown, leave coroync

10:40:25 -> other nodes rebooted by watchdog


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "dietmar" <dietmar@proxmox.com>
Cc: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Dimanche 6 Septembre 2020 07:36:10
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

>>But the pve logs look ok, and there is no indication 
>>that we stopped updating the watchdog. So why did the 
>>watchdog trigger? Maybe an IPMI bug? 

do you mean an ipmi bug on all 13 servers at the same time ? 
(I also have 2 supermicro servers in this cluster, but they use same ipmi watchdog driver. (ipmi_watchdog) 



I had same kind of with bug once (when stopping a server), on another cluster, 6 months ago. 
This was without HA, but different version of corosync, and that time, I was really seeing quorum split in the corosync logs of the servers. 


I'll try to reproduce with a virtual cluster with 14 nodes (don't have enough hardware) 


Could I be a bug in proxmox HA code, where watchdog is not resetted by LRM anymore? 

----- Mail original ----- 
De: "dietmar" <dietmar@proxmox.com> 
À: "aderumier" <aderumier@odiso.com> 
Cc: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Dimanche 6 Septembre 2020 06:21:55 
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown 

> >>So you are using ipmi hardware watchdog? 
> 
> yes, I'm using dell idrac ipmi card watchdog 

But the pve logs look ok, and there is no indication 
that we stopped updating the watchdog. So why did the 
watchdog trigger? Maybe an IPMI bug? 


_______________________________________________ 
pve-devel mailing list 
pve-devel@lists.proxmox.com 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




  parent reply	other threads:[~2020-09-06  8:44 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42   ` Dietmar Maurer
2020-09-05 13:32     ` Alexandre DERUMIER
2020-09-05 15:23       ` dietmar
2020-09-05 17:30         ` Alexandre DERUMIER
2020-09-06  4:21           ` dietmar
2020-09-06  5:36             ` Alexandre DERUMIER
2020-09-06  6:33               ` Alexandre DERUMIER
2020-09-06  8:43               ` Alexandre DERUMIER [this message]
2020-09-06 12:14                 ` dietmar
2020-09-06 12:19                   ` dietmar
2020-09-07  7:00                     ` Thomas Lamprecht
2020-09-07  7:19                   ` Alexandre DERUMIER
2020-09-07  8:18                     ` dietmar
2020-09-07  9:32                       ` Alexandre DERUMIER
2020-09-07 13:23                         ` Alexandre DERUMIER
2020-09-08  4:41                           ` dietmar
2020-09-08  7:11                             ` Alexandre DERUMIER
2020-09-09 20:05                               ` Thomas Lamprecht
2020-09-10  4:58                                 ` Alexandre DERUMIER
2020-09-10  8:21                                   ` Thomas Lamprecht
2020-09-10 11:34                                     ` Alexandre DERUMIER
2020-09-10 18:21                                       ` Thomas Lamprecht
2020-09-14  4:54                                         ` Alexandre DERUMIER
2020-09-14  7:14                                           ` Dietmar Maurer
2020-09-14  8:27                                             ` Alexandre DERUMIER
2020-09-14  8:51                                               ` Thomas Lamprecht
2020-09-14 15:45                                                 ` Alexandre DERUMIER
2020-09-15  5:45                                                   ` dietmar
2020-09-15  6:27                                                     ` Alexandre DERUMIER
2020-09-15  7:13                                                       ` dietmar
2020-09-15  8:42                                                         ` Alexandre DERUMIER
2020-09-15  9:35                                                           ` Alexandre DERUMIER
2020-09-15  9:46                                                             ` Thomas Lamprecht
2020-09-15 10:15                                                               ` Alexandre DERUMIER
2020-09-15 11:04                                                                 ` Alexandre DERUMIER
2020-09-15 12:49                                                                   ` Alexandre DERUMIER
2020-09-15 13:00                                                                     ` Thomas Lamprecht
2020-09-15 14:09                                                                       ` Alexandre DERUMIER
2020-09-15 14:19                                                                         ` Alexandre DERUMIER
2020-09-15 14:32                                                                         ` Thomas Lamprecht
2020-09-15 14:57                                                                           ` Alexandre DERUMIER
2020-09-15 15:58                                                                             ` Alexandre DERUMIER
2020-09-16  7:34                                                                               ` Alexandre DERUMIER
2020-09-16  7:58                                                                                 ` Alexandre DERUMIER
2020-09-16  8:30                                                                                   ` Alexandre DERUMIER
2020-09-16  8:53                                                                                     ` Alexandre DERUMIER
     [not found]                                                                                     ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15                                                                                       ` Alexandre DERUMIER
2020-09-16 14:45                                                                                         ` Thomas Lamprecht
2020-09-16 15:17                                                                                           ` Alexandre DERUMIER
2020-09-17  9:21                                                                                             ` Fabian Grünbichler
2020-09-17  9:59                                                                                               ` Alexandre DERUMIER
2020-09-17 10:02                                                                                                 ` Alexandre DERUMIER
2020-09-17 11:35                                                                                                   ` Thomas Lamprecht
2020-09-20 23:54                                                                                                     ` Alexandre DERUMIER
2020-09-22  5:43                                                                                                       ` Alexandre DERUMIER
2020-09-24 14:02                                                                                                         ` Fabian Grünbichler
2020-09-24 14:29                                                                                                           ` Alexandre DERUMIER
2020-09-24 18:07                                                                                                             ` Alexandre DERUMIER
2020-09-25  6:44                                                                                                               ` Alexandre DERUMIER
2020-09-25  7:15                                                                                                                 ` Alexandre DERUMIER
2020-09-25  9:19                                                                                                                   ` Fabian Grünbichler
2020-09-25  9:46                                                                                                                     ` Alexandre DERUMIER
2020-09-25 12:51                                                                                                                       ` Fabian Grünbichler
2020-09-25 16:29                                                                                                                         ` Alexandre DERUMIER
2020-09-28  9:17                                                                                                                           ` Fabian Grünbichler
2020-09-28  9:35                                                                                                                             ` Alexandre DERUMIER
2020-09-28 15:59                                                                                                                               ` Alexandre DERUMIER
2020-09-29  5:30                                                                                                                                 ` Alexandre DERUMIER
2020-09-29  8:51                                                                                                                                 ` Fabian Grünbichler
2020-09-29  9:37                                                                                                                                   ` Alexandre DERUMIER
2020-09-29 10:52                                                                                                                                     ` Alexandre DERUMIER
2020-09-29 11:43                                                                                                                                       ` Alexandre DERUMIER
2020-09-29 11:50                                                                                                                                         ` Alexandre DERUMIER
2020-09-29 13:28                                                                                                                                           ` Fabian Grünbichler
2020-09-29 13:52                                                                                                                                             ` Alexandre DERUMIER
2020-09-30  6:09                                                                                                                                               ` Alexandre DERUMIER
2020-09-30  6:26                                                                                                                                                 ` Thomas Lamprecht
2020-09-15  7:58                                                       ` Thomas Lamprecht
2020-12-29 14:21   ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15  9:16   ` Eneko Lacunza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1059698258.392627.1599381816979.JavaMail.zimbra@odiso.com \
    --to=aderumier@odiso.com \
    --cc=dietmar@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=pve-devel@pve.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal