From: Alexandre DERUMIER <aderumier@odiso.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Mon, 21 Sep 2020 01:54:59 +0200 (CEST) [thread overview]
Message-ID: <335862527.964527.1600646099489.JavaMail.zimbra@odiso.com> (raw)
In-Reply-To: <501f031f-3f1b-0633-fab3-7fcb7fdddaf5@proxmox.com>
Hi,
I have done a new test, this time with "systemctl stop corosync", wait 15s, "systemctl start corosync", wait 15s.
I was able to reproduce it at corosync stop on node1, 1second later /etc/pve was locked on all other nodes.
I have started corosync 10min later on node1, and /etc/pve has become writeable again on all nodes
node1: corosync stop: 01:26:50
node2 : /etc/pve locked : 01:26:51
http://odisoweb1.odiso.net/corosync-stop.log
pmxcfs : bt full all threads:
https://gist.github.com/aderumier/c45af4ee73b80330367e416af858bc65
pmxcfs: coredump :http://odisoweb1.odiso.net/core.17995.gz
node1:corosync start: 01:35:36
http://odisoweb1.odiso.net/corosync-start.log
BTW, I have been contacted in pm on the forum by a user following this mailing thread,
and he had exactly the same problem with a 7 nodes cluster recently.
(shutting down 1 node, /etc/pve was locked until the node was restarted)
----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht@proxmox.com>
À: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>, "aderumier" <aderumier@odiso.com>
Envoyé: Jeudi 17 Septembre 2020 13:35:55
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On 9/17/20 12:02 PM, Alexandre DERUMIER wrote:
> if needed, here my test script to reproduce it
thanks, I'm now using this specific one, had a similar (but all nodes writes)
running here since ~ two hours without luck yet, lets see how this behaves.
>
> node1 (restart corosync until node2 don't send the timestamp anymore)
> -----
>
> #!/bin/bash
>
> for i in `seq 10000`; do
> now=$(date +"%T")
> echo "restart corosync : $now"
> systemctl restart corosync
> for j in {1..59}; do
> last=$(cat /tmp/timestamp)
> curr=`date '+%s'`
> diff=$(($curr - $last))
> if [ $diff -gt 20 ]; then
> echo "too old"
> exit 0
> fi
> sleep 1
> done
> done
>
>
>
> node2 (write to /etc/pve/test each second, then send the last timestamp to node1)
> -----
> #!/bin/bash
> for i in {1..10000};
> do
> now=$(date +"%T")
> echo "Current time : $now"
> curr=`date '+%s'`
> ssh root@node1 "echo $curr > /tmp/timestamp"
> echo "test" > /etc/pve/test
> sleep 1
> done
>
next prev parent reply other threads:[~2020-09-20 23:55 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42 ` Dietmar Maurer
2020-09-05 13:32 ` Alexandre DERUMIER
2020-09-05 15:23 ` dietmar
2020-09-05 17:30 ` Alexandre DERUMIER
2020-09-06 4:21 ` dietmar
2020-09-06 5:36 ` Alexandre DERUMIER
2020-09-06 6:33 ` Alexandre DERUMIER
2020-09-06 8:43 ` Alexandre DERUMIER
2020-09-06 12:14 ` dietmar
2020-09-06 12:19 ` dietmar
2020-09-07 7:00 ` Thomas Lamprecht
2020-09-07 7:19 ` Alexandre DERUMIER
2020-09-07 8:18 ` dietmar
2020-09-07 9:32 ` Alexandre DERUMIER
2020-09-07 13:23 ` Alexandre DERUMIER
2020-09-08 4:41 ` dietmar
2020-09-08 7:11 ` Alexandre DERUMIER
2020-09-09 20:05 ` Thomas Lamprecht
2020-09-10 4:58 ` Alexandre DERUMIER
2020-09-10 8:21 ` Thomas Lamprecht
2020-09-10 11:34 ` Alexandre DERUMIER
2020-09-10 18:21 ` Thomas Lamprecht
2020-09-14 4:54 ` Alexandre DERUMIER
2020-09-14 7:14 ` Dietmar Maurer
2020-09-14 8:27 ` Alexandre DERUMIER
2020-09-14 8:51 ` Thomas Lamprecht
2020-09-14 15:45 ` Alexandre DERUMIER
2020-09-15 5:45 ` dietmar
2020-09-15 6:27 ` Alexandre DERUMIER
2020-09-15 7:13 ` dietmar
2020-09-15 8:42 ` Alexandre DERUMIER
2020-09-15 9:35 ` Alexandre DERUMIER
2020-09-15 9:46 ` Thomas Lamprecht
2020-09-15 10:15 ` Alexandre DERUMIER
2020-09-15 11:04 ` Alexandre DERUMIER
2020-09-15 12:49 ` Alexandre DERUMIER
2020-09-15 13:00 ` Thomas Lamprecht
2020-09-15 14:09 ` Alexandre DERUMIER
2020-09-15 14:19 ` Alexandre DERUMIER
2020-09-15 14:32 ` Thomas Lamprecht
2020-09-15 14:57 ` Alexandre DERUMIER
2020-09-15 15:58 ` Alexandre DERUMIER
2020-09-16 7:34 ` Alexandre DERUMIER
2020-09-16 7:58 ` Alexandre DERUMIER
2020-09-16 8:30 ` Alexandre DERUMIER
2020-09-16 8:53 ` Alexandre DERUMIER
[not found] ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15 ` Alexandre DERUMIER
2020-09-16 14:45 ` Thomas Lamprecht
2020-09-16 15:17 ` Alexandre DERUMIER
2020-09-17 9:21 ` Fabian Grünbichler
2020-09-17 9:59 ` Alexandre DERUMIER
2020-09-17 10:02 ` Alexandre DERUMIER
2020-09-17 11:35 ` Thomas Lamprecht
2020-09-20 23:54 ` Alexandre DERUMIER [this message]
2020-09-22 5:43 ` Alexandre DERUMIER
2020-09-24 14:02 ` Fabian Grünbichler
2020-09-24 14:29 ` Alexandre DERUMIER
2020-09-24 18:07 ` Alexandre DERUMIER
2020-09-25 6:44 ` Alexandre DERUMIER
2020-09-25 7:15 ` Alexandre DERUMIER
2020-09-25 9:19 ` Fabian Grünbichler
2020-09-25 9:46 ` Alexandre DERUMIER
2020-09-25 12:51 ` Fabian Grünbichler
2020-09-25 16:29 ` Alexandre DERUMIER
2020-09-28 9:17 ` Fabian Grünbichler
2020-09-28 9:35 ` Alexandre DERUMIER
2020-09-28 15:59 ` Alexandre DERUMIER
2020-09-29 5:30 ` Alexandre DERUMIER
2020-09-29 8:51 ` Fabian Grünbichler
2020-09-29 9:37 ` Alexandre DERUMIER
2020-09-29 10:52 ` Alexandre DERUMIER
2020-09-29 11:43 ` Alexandre DERUMIER
2020-09-29 11:50 ` Alexandre DERUMIER
2020-09-29 13:28 ` Fabian Grünbichler
2020-09-29 13:52 ` Alexandre DERUMIER
2020-09-30 6:09 ` Alexandre DERUMIER
2020-09-30 6:26 ` Thomas Lamprecht
2020-09-15 7:58 ` Thomas Lamprecht
2020-12-29 14:21 ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15 9:16 ` Eneko Lacunza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=335862527.964527.1600646099489.JavaMail.zimbra@odiso.com \
--to=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal