From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 03A166265E for ; Wed, 16 Sep 2020 16:45:46 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E183825857 for ; Wed, 16 Sep 2020 16:45:15 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 484D825847 for ; Wed, 16 Sep 2020 16:45:14 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 18F29453C8; Wed, 16 Sep 2020 16:45:14 +0200 (CEST) To: Alexandre DERUMIER , Proxmox VE development discussion References: <216436814.339545.1599142316781.JavaMail.zimbra@odiso.com> <6b680921-12d0-006b-6d04-bbe1c4bb04f8@proxmox.com> <132388307.839866.1600181866529.JavaMail.zimbra@odiso.com> <597522514.840749.1600185513450.JavaMail.zimbra@odiso.com> <1097647242.851726.1600241667098.JavaMail.zimbra@odiso.com> <602718914.852368.1600243082185.JavaMail.zimbra@odiso.com> <1767271081.853403.1600245029802.JavaMail.zimbra@odiso.com> <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com> <2054513461.868164.1600262132255.JavaMail.zimbra@odiso.com> From: Thomas Lamprecht Message-ID: <2bdde345-b966-d393-44d1-e5385821fbad@proxmox.com> Date: Wed, 16 Sep 2020 16:45:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:81.0) Gecko/20100101 Thunderbird/81.0 MIME-Version: 1.0 In-Reply-To: <2054513461.868164.1600262132255.JavaMail.zimbra@odiso.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.199 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [odiso.net] Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2020 14:45:46 -0000 On 9/16/20 3:15 PM, Alexandre DERUMIER wrote: > I have reproduce it again, with pmxcfs in debug mode > > corosync restart at 15:02:10, and it was already block on other nodes at 15:02:12 > > The pmxcfs was still logging after the lock. > > > here the log on node1 where corosync has been restarted > > http://odisoweb1.odiso.net/pmxcfs-corosync.log > thanks for those, I need a bit to sift through them. Seem like either dfsm gets out of sync or we do not get a ACK reply from cpg_send. A full core dump would be still nice, in gdb: generate-core-file PS: instead of manually switching to threads you can do: thread apply all bt full to get a backtrace for all threads in one command