public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexandre DERUMIER <aderumier@odiso.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	 dietmar <dietmar@proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Mon, 14 Sep 2020 17:45:05 +0200 (CEST)	[thread overview]
Message-ID: <1775665592.735772.1600098305930.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <88fe5075-870d-9197-7c84-71ae8a25e9dd@proxmox.com>

>>Did you get in contact with knet/corosync devs about this? 
>>Because, it may well be something their stack is better at handling it, maybe 
>>there's also really still a bug, or bad behaviour on some edge cases... 

not yet, I would like to have more infos to submit, because I'm blind.
I have enabled debug logs on all my cluster if that happen again.


BTW,
I have noticed something, 

corosync is stopped after syslog stop, so at shutdown we never have corosync log


I have edit corosync.service

- After=network-online.target
+ After=network-online.target syslog.target


and now It's logging correctly.



Now, that logging work, I'm also seeeing pmxcfs errors when corosync is stopping.
(But no pmxcfs shutdown log)

Do you think it's possible to have a clean shutdown of pmxcfs first, before stopping corosync ?


"
Sep 14 17:23:49 pve corosync[1346]:   [MAIN  ] Node was shut down by a signal
Sep 14 17:23:49 pve systemd[1]: Stopping Corosync Cluster Engine...
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Unloading all Corosync service engines.
Sep 14 17:23:49 pve corosync[1346]:   [QB    ] withdrawing server sockets
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Sep 14 17:23:49 pve pmxcfs[1132]: [confdb] crit: cmap_dispatch failed: 2
Sep 14 17:23:49 pve corosync[1346]:   [QB    ] withdrawing server sockets
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync configuration map access
Sep 14 17:23:49 pve corosync[1346]:   [QB    ] withdrawing server sockets
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync configuration service
Sep 14 17:23:49 pve pmxcfs[1132]: [status] crit: cpg_dispatch failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [status] crit: cpg_leave failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [dcdb] crit: cpg_dispatch failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [dcdb] crit: cpg_leave failed: 2
Sep 14 17:23:49 pve corosync[1346]:   [QB    ] withdrawing server sockets
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Sep 14 17:23:49 pve pmxcfs[1132]: [quorum] crit: quorum_dispatch failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [status] notice: node lost quorum
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync profile loading service
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync resource monitoring service
Sep 14 17:23:49 pve corosync[1346]:   [SERV  ] Service engine unloaded: corosync watchdog service
Sep 14 17:23:49 pve pmxcfs[1132]: [quorum] crit: quorum_initialize failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [quorum] crit: can't initialize service
Sep 14 17:23:49 pve pmxcfs[1132]: [confdb] crit: cmap_initialize failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [confdb] crit: can't initialize service
Sep 14 17:23:49 pve pmxcfs[1132]: [dcdb] notice: start cluster connection
Sep 14 17:23:49 pve pmxcfs[1132]: [dcdb] crit: cpg_initialize failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [dcdb] crit: can't initialize service
Sep 14 17:23:49 pve pmxcfs[1132]: [status] notice: start cluster connection
Sep 14 17:23:49 pve pmxcfs[1132]: [status] crit: cpg_initialize failed: 2
Sep 14 17:23:49 pve pmxcfs[1132]: [status] crit: can't initialize service
Sep 14 17:23:50 pve corosync[1346]:   [MAIN  ] Corosync Cluster Engine exiting normally
"



----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht@proxmox.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "aderumier" <aderumier@odiso.com>, "dietmar" <dietmar@proxmox.com>
Envoyé: Lundi 14 Septembre 2020 10:51:03
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 9/14/20 10:27 AM, Alexandre DERUMIER wrote: 
>> I wonder if something like pacemaker sbd could be implemented in proxmox as extra layer of protection ? 
> 
>>> AFAIK Thomas already has patches to implement active fencing. 
> 
>>> But IMHO this will not solve the corosync problems.. 
> 
> Yes, sure. I'm really to have to 2 differents sources of verification, with different path/software, to avoid this kind of bug. 
> (shit happens, murphy law ;) 

would then need at least three, and if one has a bug flooding the network in 
a lot of setups (not having beefy switches like you ;) the other two will be 
taken down also, either as memory or the system stack gets overloaded. 

> 
> as we say in French "ceinture & bretelles" -> "belt and braces" 
> 
> 
> BTW, 
> a user have reported new corosync problem here: 
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871 
> (Sound like the bug that I have 6month ago, with corosync bug flooding a lof of udp packets, but not the same bug I have here) 

Did you get in contact with knet/corosync devs about this? 

Because, it may well be something their stack is better at handling it, maybe 
there's also really still a bug, or bad behaviour on some edge cases... 




  reply	other threads:[~2020-09-14 15:45 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42   ` Dietmar Maurer
2020-09-05 13:32     ` Alexandre DERUMIER
2020-09-05 15:23       ` dietmar
2020-09-05 17:30         ` Alexandre DERUMIER
2020-09-06  4:21           ` dietmar
2020-09-06  5:36             ` Alexandre DERUMIER
2020-09-06  6:33               ` Alexandre DERUMIER
2020-09-06  8:43               ` Alexandre DERUMIER
2020-09-06 12:14                 ` dietmar
2020-09-06 12:19                   ` dietmar
2020-09-07  7:00                     ` Thomas Lamprecht
2020-09-07  7:19                   ` Alexandre DERUMIER
2020-09-07  8:18                     ` dietmar
2020-09-07  9:32                       ` Alexandre DERUMIER
2020-09-07 13:23                         ` Alexandre DERUMIER
2020-09-08  4:41                           ` dietmar
2020-09-08  7:11                             ` Alexandre DERUMIER
2020-09-09 20:05                               ` Thomas Lamprecht
2020-09-10  4:58                                 ` Alexandre DERUMIER
2020-09-10  8:21                                   ` Thomas Lamprecht
2020-09-10 11:34                                     ` Alexandre DERUMIER
2020-09-10 18:21                                       ` Thomas Lamprecht
2020-09-14  4:54                                         ` Alexandre DERUMIER
2020-09-14  7:14                                           ` Dietmar Maurer
2020-09-14  8:27                                             ` Alexandre DERUMIER
2020-09-14  8:51                                               ` Thomas Lamprecht
2020-09-14 15:45                                                 ` Alexandre DERUMIER [this message]
2020-09-15  5:45                                                   ` dietmar
2020-09-15  6:27                                                     ` Alexandre DERUMIER
2020-09-15  7:13                                                       ` dietmar
2020-09-15  8:42                                                         ` Alexandre DERUMIER
2020-09-15  9:35                                                           ` Alexandre DERUMIER
2020-09-15  9:46                                                             ` Thomas Lamprecht
2020-09-15 10:15                                                               ` Alexandre DERUMIER
2020-09-15 11:04                                                                 ` Alexandre DERUMIER
2020-09-15 12:49                                                                   ` Alexandre DERUMIER
2020-09-15 13:00                                                                     ` Thomas Lamprecht
2020-09-15 14:09                                                                       ` Alexandre DERUMIER
2020-09-15 14:19                                                                         ` Alexandre DERUMIER
2020-09-15 14:32                                                                         ` Thomas Lamprecht
2020-09-15 14:57                                                                           ` Alexandre DERUMIER
2020-09-15 15:58                                                                             ` Alexandre DERUMIER
2020-09-16  7:34                                                                               ` Alexandre DERUMIER
2020-09-16  7:58                                                                                 ` Alexandre DERUMIER
2020-09-16  8:30                                                                                   ` Alexandre DERUMIER
2020-09-16  8:53                                                                                     ` Alexandre DERUMIER
     [not found]                                                                                     ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15                                                                                       ` Alexandre DERUMIER
2020-09-16 14:45                                                                                         ` Thomas Lamprecht
2020-09-16 15:17                                                                                           ` Alexandre DERUMIER
2020-09-17  9:21                                                                                             ` Fabian Grünbichler
2020-09-17  9:59                                                                                               ` Alexandre DERUMIER
2020-09-17 10:02                                                                                                 ` Alexandre DERUMIER
2020-09-17 11:35                                                                                                   ` Thomas Lamprecht
2020-09-20 23:54                                                                                                     ` Alexandre DERUMIER
2020-09-22  5:43                                                                                                       ` Alexandre DERUMIER
2020-09-24 14:02                                                                                                         ` Fabian Grünbichler
2020-09-24 14:29                                                                                                           ` Alexandre DERUMIER
2020-09-24 18:07                                                                                                             ` Alexandre DERUMIER
2020-09-25  6:44                                                                                                               ` Alexandre DERUMIER
2020-09-25  7:15                                                                                                                 ` Alexandre DERUMIER
2020-09-25  9:19                                                                                                                   ` Fabian Grünbichler
2020-09-25  9:46                                                                                                                     ` Alexandre DERUMIER
2020-09-25 12:51                                                                                                                       ` Fabian Grünbichler
2020-09-25 16:29                                                                                                                         ` Alexandre DERUMIER
2020-09-28  9:17                                                                                                                           ` Fabian Grünbichler
2020-09-28  9:35                                                                                                                             ` Alexandre DERUMIER
2020-09-28 15:59                                                                                                                               ` Alexandre DERUMIER
2020-09-29  5:30                                                                                                                                 ` Alexandre DERUMIER
2020-09-29  8:51                                                                                                                                 ` Fabian Grünbichler
2020-09-29  9:37                                                                                                                                   ` Alexandre DERUMIER
2020-09-29 10:52                                                                                                                                     ` Alexandre DERUMIER
2020-09-29 11:43                                                                                                                                       ` Alexandre DERUMIER
2020-09-29 11:50                                                                                                                                         ` Alexandre DERUMIER
2020-09-29 13:28                                                                                                                                           ` Fabian Grünbichler
2020-09-29 13:52                                                                                                                                             ` Alexandre DERUMIER
2020-09-30  6:09                                                                                                                                               ` Alexandre DERUMIER
2020-09-30  6:26                                                                                                                                                 ` Thomas Lamprecht
2020-09-15  7:58                                                       ` Thomas Lamprecht
2020-12-29 14:21   ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15  9:16   ` Eneko Lacunza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1775665592.735772.1600098305930.JavaMail.zimbra@odiso.com \
    --to=aderumier@odiso.com \
    --cc=dietmar@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal