public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Alexandre DERUMIER <aderumier@odiso.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	 dietmar <dietmar@proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Thu, 10 Sep 2020 06:58:12 +0200 (CEST)	[thread overview]
Message-ID: <761694744.496919.1599713892772.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <e80f1080-253d-c43c-4402-258855bcbf18@proxmox.com>

Thanks Thomas for the investigations.

I'm still trying to reproduce...
I think I have some special case here, because the user of the forum with 30 nodes had corosync cluster split. (Note that I had this bug 6 months ago,when shuting down a node too, and the only way was stop full stop corosync on all nodes, and start corosync again on all nodes).


But this time, corosync logs looks fine. (every node, correctly see node2 down, and see remaning nodes)

surviving node7, was the only node with HA, and LRM didn't have enable watchog (I don't have found any log like "pve-ha-lrm: watchdog active" for the last 6months on this nodes


So, the timing was:

10:39:05 : "halt" command is send to node2
10:39:16 : node2 is leaving corosync / halt  -> every node is seeing it and correctly do a new membership with 13 remaining nodes

...don't see any special logs (corosync,pmxcfs,pve-ha-crm,pve-ha-lrm) after the node2 leaving.
But they are still activity on the server, pve-firewall is still logging, vms are running fine


between 10:40:25 - 10:40:34 : watchdog reset nodes, but not node7.

-> so between 70s-80s after the node2 was done, so I think that watchdog-mux was still running fine until that.
   (That's sound like lrm was stuck, and client_watchdog_timeout have expired in watchdog-mux) 



10:40:41 node7, loose quorum (as all others nodes have reset),



10:40:50: node7 crm/lrm finally log.

Sep  3 10:40:50 m6kvm7 pve-ha-crm[16196]: got unexpected error - error during cfs-locked 'domain-ha' operation: no quorum!
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: loop take too long (87 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: loop take too long (92 seconds)
Sep  3 10:40:51 m6kvm7 pve-ha-crm[16196]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Sep  3 10:40:51 m6kvm7 pve-ha-lrm[16140]: lost lock 'ha_agent_m6kvm7_lock - cfs lock update failed - Permission denied



So, I really think that something have stucked lrm/crm loop, and watchdog was not resetted because of that.





----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht@proxmox.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "aderumier" <aderumier@odiso.com>, "dietmar" <dietmar@proxmox.com>
Envoyé: Mercredi 9 Septembre 2020 22:05:49
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 08.09.20 09:11, Alexandre DERUMIER wrote: 
>>> It would really help if we can reproduce the bug somehow. Do you have and idea how 
>>> to trigger the bug? 
> 
> I really don't known. I'm currently trying to reproduce on the same cluster, with softdog && noboot=1, and rebooting node. 
> 
> 
> Maybe it's related with the number of vms, or the number of nodes, don't have any clue ... 

I checked a bit the watchdog code, our user-space mux one and the kernel drivers, 
and just noting a few things here (thinking out aloud): 

The /dev/watchdog itself is always active, else we could loose it to some 
other program and not be able to activate HA dynamically. 
But, as long as no HA service got active, it's a simple dummy "wake up every 
second and do an ioctl keep-alive update". 
This is really simple and efficiently written, so if that fails for over 10s 
the systems is really loaded, probably barely responding to anything. 

Currently the watchdog-mux runs as normal process, no re-nice, no real-time 
scheduling. This is IMO wrong, as it is a critical process which needs to be 
run with high priority. I've a patch here which sets it to the highest RR 
realtime-scheduling priority available, effectively the same what corosync does. 


diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c 
index 818ae00..71981d7 100644 
--- a/src/watchdog-mux.c 
+++ b/src/watchdog-mux.c 
@@ -8,2 +8,3 @@ 
#include <time.h> 
+#include <sched.h> 
#include <sys/ioctl.h> 
@@ -151,2 +177,15 @@ main(void) 

+ int sched_priority = sched_get_priority_max (SCHED_RR); 
+ if (sched_priority != -1) { 
+ struct sched_param global_sched_param; 
+ global_sched_param.sched_priority = sched_priority; 
+ int res = sched_setscheduler (0, SCHED_RR, &global_sched_param); 
+ if (res == -1) { 
+ fprintf(stderr, "Could not set SCHED_RR at priority %d\n", sched_priority); 
+ } else { 
+ fprintf(stderr, "set SCHED_RR at priority %d\n", sched_priority); 
+ } 
+ } 
+ 
+ 
if ((watchdog_fd = open(WATCHDOG_DEV, O_WRONLY)) == -1) { 

The issue with no HA but watchdog reset due to massively overloaded system 
should be avoided already a lot with the scheduling change alone. 

Interesting, IMO, is that lots of nodes rebooted at the same time, with no HA active. 
This *could* come from a side-effect like ceph rebalacing kicking off and producing 
a load spike for >10s, hindering the scheduling of the watchdog-mux. 
This is a theory, but with HA off it needs to be something like that, as in HA-off 
case there's *no* direct or indirect connection between corosync/pmxcfs and the 
watchdog-mux. It simply does not cares, or notices, quorum partition changes at all. 


There may be a approach to reserve the watchdog for the mux, but avoid having it 
as "ticking time bomb": 
Theoretically one could open it, then disable it with an ioctl (it can be queried 
if a driver support that) and only enable it for real once the first client connects 
to the MUX. This may not work for all watchdog modules, and if, we may want to make 
it configurable, as some people actually want a reset if a (future) real-time process 
cannot be scheduled for >= 10 seconds. 

With HA active, well then there could be something off, either in corosync/knet or 
also in how we interface with it in pmxcfs, that could well be, but won't explain the 
non-HA issues. 

Speaking of pmxcfs, that one runs also with standard priority, we may want to change 
that too to a RT scheduler, so that its ensured it can process all corosync events. 

I have also a few other small watchdog mux patches around, it should nowadays actually 
be able to tell us why a reset happened (can also be over/under voltage, temperature, 
...) and I'll repeat doing the ioctl for keep-alive a few times if it fails, can only 
win with that after all. 




  reply	other threads:[~2020-09-10  4:58 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42   ` Dietmar Maurer
2020-09-05 13:32     ` Alexandre DERUMIER
2020-09-05 15:23       ` dietmar
2020-09-05 17:30         ` Alexandre DERUMIER
2020-09-06  4:21           ` dietmar
2020-09-06  5:36             ` Alexandre DERUMIER
2020-09-06  6:33               ` Alexandre DERUMIER
2020-09-06  8:43               ` Alexandre DERUMIER
2020-09-06 12:14                 ` dietmar
2020-09-06 12:19                   ` dietmar
2020-09-07  7:00                     ` Thomas Lamprecht
2020-09-07  7:19                   ` Alexandre DERUMIER
2020-09-07  8:18                     ` dietmar
2020-09-07  9:32                       ` Alexandre DERUMIER
2020-09-07 13:23                         ` Alexandre DERUMIER
2020-09-08  4:41                           ` dietmar
2020-09-08  7:11                             ` Alexandre DERUMIER
2020-09-09 20:05                               ` Thomas Lamprecht
2020-09-10  4:58                                 ` Alexandre DERUMIER [this message]
2020-09-10  8:21                                   ` Thomas Lamprecht
2020-09-10 11:34                                     ` Alexandre DERUMIER
2020-09-10 18:21                                       ` Thomas Lamprecht
2020-09-14  4:54                                         ` Alexandre DERUMIER
2020-09-14  7:14                                           ` Dietmar Maurer
2020-09-14  8:27                                             ` Alexandre DERUMIER
2020-09-14  8:51                                               ` Thomas Lamprecht
2020-09-14 15:45                                                 ` Alexandre DERUMIER
2020-09-15  5:45                                                   ` dietmar
2020-09-15  6:27                                                     ` Alexandre DERUMIER
2020-09-15  7:13                                                       ` dietmar
2020-09-15  8:42                                                         ` Alexandre DERUMIER
2020-09-15  9:35                                                           ` Alexandre DERUMIER
2020-09-15  9:46                                                             ` Thomas Lamprecht
2020-09-15 10:15                                                               ` Alexandre DERUMIER
2020-09-15 11:04                                                                 ` Alexandre DERUMIER
2020-09-15 12:49                                                                   ` Alexandre DERUMIER
2020-09-15 13:00                                                                     ` Thomas Lamprecht
2020-09-15 14:09                                                                       ` Alexandre DERUMIER
2020-09-15 14:19                                                                         ` Alexandre DERUMIER
2020-09-15 14:32                                                                         ` Thomas Lamprecht
2020-09-15 14:57                                                                           ` Alexandre DERUMIER
2020-09-15 15:58                                                                             ` Alexandre DERUMIER
2020-09-16  7:34                                                                               ` Alexandre DERUMIER
2020-09-16  7:58                                                                                 ` Alexandre DERUMIER
2020-09-16  8:30                                                                                   ` Alexandre DERUMIER
2020-09-16  8:53                                                                                     ` Alexandre DERUMIER
     [not found]                                                                                     ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15                                                                                       ` Alexandre DERUMIER
2020-09-16 14:45                                                                                         ` Thomas Lamprecht
2020-09-16 15:17                                                                                           ` Alexandre DERUMIER
2020-09-17  9:21                                                                                             ` Fabian Grünbichler
2020-09-17  9:59                                                                                               ` Alexandre DERUMIER
2020-09-17 10:02                                                                                                 ` Alexandre DERUMIER
2020-09-17 11:35                                                                                                   ` Thomas Lamprecht
2020-09-20 23:54                                                                                                     ` Alexandre DERUMIER
2020-09-22  5:43                                                                                                       ` Alexandre DERUMIER
2020-09-24 14:02                                                                                                         ` Fabian Grünbichler
2020-09-24 14:29                                                                                                           ` Alexandre DERUMIER
2020-09-24 18:07                                                                                                             ` Alexandre DERUMIER
2020-09-25  6:44                                                                                                               ` Alexandre DERUMIER
2020-09-25  7:15                                                                                                                 ` Alexandre DERUMIER
2020-09-25  9:19                                                                                                                   ` Fabian Grünbichler
2020-09-25  9:46                                                                                                                     ` Alexandre DERUMIER
2020-09-25 12:51                                                                                                                       ` Fabian Grünbichler
2020-09-25 16:29                                                                                                                         ` Alexandre DERUMIER
2020-09-28  9:17                                                                                                                           ` Fabian Grünbichler
2020-09-28  9:35                                                                                                                             ` Alexandre DERUMIER
2020-09-28 15:59                                                                                                                               ` Alexandre DERUMIER
2020-09-29  5:30                                                                                                                                 ` Alexandre DERUMIER
2020-09-29  8:51                                                                                                                                 ` Fabian Grünbichler
2020-09-29  9:37                                                                                                                                   ` Alexandre DERUMIER
2020-09-29 10:52                                                                                                                                     ` Alexandre DERUMIER
2020-09-29 11:43                                                                                                                                       ` Alexandre DERUMIER
2020-09-29 11:50                                                                                                                                         ` Alexandre DERUMIER
2020-09-29 13:28                                                                                                                                           ` Fabian Grünbichler
2020-09-29 13:52                                                                                                                                             ` Alexandre DERUMIER
2020-09-30  6:09                                                                                                                                               ` Alexandre DERUMIER
2020-09-30  6:26                                                                                                                                                 ` Thomas Lamprecht
2020-09-15  7:58                                                       ` Thomas Lamprecht
2020-12-29 14:21   ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15  9:16   ` Eneko Lacunza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=761694744.496919.1599713892772.JavaMail.zimbra@odiso.com \
    --to=aderumier@odiso.com \
    --cc=dietmar@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal