From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7320662918 for ; Thu, 17 Sep 2020 13:35:57 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6496AD753 for ; Thu, 17 Sep 2020 13:35:57 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id E1F48D746 for ; Thu, 17 Sep 2020 13:35:56 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A1DCB45413; Thu, 17 Sep 2020 13:35:56 +0200 (CEST) To: Proxmox VE development discussion , Alexandre DERUMIER References: <216436814.339545.1599142316781.JavaMail.zimbra@odiso.com> <1767271081.853403.1600245029802.JavaMail.zimbra@odiso.com> <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com> <2054513461.868164.1600262132255.JavaMail.zimbra@odiso.com> <2bdde345-b966-d393-44d1-e5385821fbad@proxmox.com> <65105078.871552.1600269422383.JavaMail.zimbra@odiso.com> <1600333910.bmtyynl8cl.astroid@nora.none> <475756962.894651.1600336772315.JavaMail.zimbra@odiso.com> <86855479.894870.1600336947072.JavaMail.zimbra@odiso.com> From: Thomas Lamprecht Message-ID: <501f031f-3f1b-0633-fab3-7fcb7fdddaf5@proxmox.com> Date: Thu, 17 Sep 2020 13:35:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:81.0) Gecko/20100101 Thunderbird/81.0 MIME-Version: 1.0 In-Reply-To: <86855479.894870.1600336947072.JavaMail.zimbra@odiso.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.166 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.062 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2020 11:35:57 -0000 On 9/17/20 12:02 PM, Alexandre DERUMIER wrote: > if needed, here my test script to reproduce it thanks, I'm now using this specific one, had a similar (but all nodes writes) running here since ~ two hours without luck yet, lets see how this behaves. > > node1 (restart corosync until node2 don't send the timestamp anymore) > ----- > > #!/bin/bash > > for i in `seq 10000`; do > now=$(date +"%T") > echo "restart corosync : $now" > systemctl restart corosync > for j in {1..59}; do > last=$(cat /tmp/timestamp) > curr=`date '+%s'` > diff=$(($curr - $last)) > if [ $diff -gt 20 ]; then > echo "too old" > exit 0 > fi > sleep 1 > done > done > > > > node2 (write to /etc/pve/test each second, then send the last timestamp to node1) > ----- > #!/bin/bash > for i in {1..10000}; > do > now=$(date +"%T") > echo "Current time : $now" > curr=`date '+%s'` > ssh root@node1 "echo $curr > /tmp/timestamp" > echo "test" > /etc/pve/test > sleep 1 > done >