From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5A969621FF for ; Wed, 16 Sep 2020 09:58:08 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 4D73520A15 for ; Wed, 16 Sep 2020 09:58:08 +0200 (CEST) Received: from mailpro.odiso.net (mailpro.odiso.net [89.248.211.110]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id EF5E920A0B for ; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mailpro.odiso.net (Postfix) with ESMTP id B2E481A97624; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) Received: from mailpro.odiso.net ([127.0.0.1]) by localhost (mailpro.odiso.net [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Pn5RH9Ly_6RT; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mailpro.odiso.net (Postfix) with ESMTP id 8CF321A97627; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) X-Virus-Scanned: amavisd-new at mailpro.odiso.com Received: from mailpro.odiso.net ([127.0.0.1]) by localhost (mailpro.odiso.net [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id aoahfkIJvANr; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) Received: from mailpro.odiso.net (mailpro.odiso.net [10.1.31.111]) by mailpro.odiso.net (Postfix) with ESMTP id 647E11A97626; Wed, 16 Sep 2020 09:58:02 +0200 (CEST) Date: Wed, 16 Sep 2020 09:58:02 +0200 (CEST) From: Alexandre DERUMIER To: Proxmox VE development discussion Cc: Thomas Lamprecht Message-ID: <602718914.852368.1600243082185.JavaMail.zimbra@odiso.com> In-Reply-To: <1097647242.851726.1600241667098.JavaMail.zimbra@odiso.com> References: <216436814.339545.1599142316781.JavaMail.zimbra@odiso.com> <1800811328.836757.1600174194769.JavaMail.zimbra@odiso.com> <43250fdc-55ba-03d9-2507-a2b08c5945ce@proxmox.com> <1798333820.838842.1600178990068.JavaMail.zimbra@odiso.com> <6b680921-12d0-006b-6d04-bbe1c4bb04f8@proxmox.com> <132388307.839866.1600181866529.JavaMail.zimbra@odiso.com> <597522514.840749.1600185513450.JavaMail.zimbra@odiso.com> <1097647242.851726.1600241667098.JavaMail.zimbra@odiso.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.8.12_GA_3866 (ZimbraWebClient - GC83 (Linux)/8.8.12_GA_3844) Thread-Topic: corosync bug: cluster break after 1 node clean shutdown Thread-Index: fqzQ8CV4gT3UroNXiJlm8US/HHWe/CRQ8Ou9ZLNf8XDIUQ0GoQ== X-SPAM-LEVEL: Spam detection results: 0 AWL 0.085 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2020 07:58:08 -0000 here a backtrace with pve-cluster-dbgsym installed (gdb) bt full #0 0x00007fce67721896 in do_futex_wait.constprop () from /lib/x86_64-linux= -gnu/libpthread.so.0 No symbol table info available. #1 0x00007fce67721988 in __new_sem_wait_slow.constprop.0 () from /lib/x86_= 64-linux-gnu/libpthread.so.0 No symbol table info available. #2 0x00007fce678c5f98 in fuse_session_loop_mt () from /lib/x86_64-linux-gn= u/libfuse.so.2 No symbol table info available. #3 0x00007fce678cb577 in fuse_loop_mt () from /lib/x86_64-linux-gnu/libfus= e.so.2 No symbol table info available. #4 0x00005617f6d5ab0e in main (argc=3D, argv=3D) at pmxcfs.c:1055 ret =3D -1 lockfd =3D pipefd =3D {8, 9} foreground =3D 0 force_local_mode =3D 0 wrote_pidfile =3D 1 memdb =3D 0x5617f7c563b0 dcdb =3D 0x5617f8046ca0 status_fsm =3D 0x5617f806a630 context =3D entries =3D {{long_name =3D 0x5617f6d7440f "debug", short_name =3D = 100 'd', flags =3D 0, arg =3D G_OPTION_ARG_NONE, arg_data =3D 0x5617f6d8210= 4 , description =3D 0x5617f6d742cb "Turn on debug messages", arg_de= scription =3D 0x0}, { long_name =3D 0x5617f6d742e2 "foreground", short_name =3D 102 '= f', flags =3D 0, arg =3D G_OPTION_ARG_NONE, arg_data =3D 0x7ffd672edc38, de= scription =3D 0x5617f6d742ed "Do not daemonize server", arg_description =3D= 0x0}, { long_name =3D 0x5617f6d74305 "local", short_name =3D 108 'l', f= lags =3D 0, arg =3D G_OPTION_ARG_NONE, arg_data =3D 0x7ffd672edc3c, descrip= tion =3D 0x5617f6d746a0 "Force local mode (ignore corosync.conf, force quor= um)",=20 arg_description =3D 0x0}, {long_name =3D 0x0, short_name =3D 0 = '\000', flags =3D 0, arg =3D G_OPTION_ARG_NONE, arg_data =3D 0x0, descripti= on =3D 0x0, arg_description =3D 0x0}} err =3D 0x0 __func__ =3D "main" utsname =3D {sysname =3D "Linux", '\000' , nodena= me =3D "m6kvm1", '\000' , release =3D "5.4.60-1-pve", '\0= 00' ,=20 version =3D "#1 SMP PVE 5.4.60-1 (Mon, 31 Aug 2020 10:36:22 +0200= )", '\000' , machine =3D "x86_64", '\000' , __domainname =3D "(none)", '\000' } dot =3D www_data =3D create =3D conf_data =3D 0x5617f80466a0 len =3D config =3D bplug =3D fa =3D {0x5617f6d7441e "-f", 0x5617f6d74421 "-odefault_permissions"= , 0x5617f6d74437 "-oallow_other", 0x0} fuse_args =3D {argc =3D 1, argv =3D 0x5617f8046960, allocated =3D 1= } fuse_chan =3D 0x5617f7c560c0 corosync_loop =3D 0x5617f80481d0 service_quorum =3D 0x5617f8048460 service_confdb =3D 0x5617f8069da0 service_dcdb =3D 0x5617f806a5d0 service_status =3D 0x5617f806a8e0 ----- Mail original ----- De: "aderumier" =C3=80: "Proxmox VE development discussion" Cc: "Thomas Lamprecht" Envoy=C3=A9: Mercredi 16 Septembre 2020 09:34:27 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutd= own Hi,=20 I have produced the problem now,=20 and this time I don't have restarted corosync a second after the lock of /e= tc/pve=20 so, currently it's readonly.=20 I don't have used gdb since a long time,=20 could you tell me how to attach to the running pmxcfs process, and some gbd= commands to launch ?=20 ----- Mail original -----=20 De: "aderumier" =20 =C3=80: "Proxmox VE development discussion" = =20 Cc: "Thomas Lamprecht" =20 Envoy=C3=A9: Mardi 15 Septembre 2020 17:58:33=20 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutd= own=20 Another small lock at 17:41:09=20 To be sure, I have done a small loop of write each second in /etc/pve, node= node2.=20 it's hanging at first corosync restart, then, on second corosync restart it= 's working again.=20 I'll try to improve this tomorrow to be able to debug corosync process=20 - restarting corosync=20 do some write in /etc/pve/=20 - and if it's hanging don't restart corosync again=20 node2: echo test > /etc/pve/test loop=20 --------------------------------------=20 Current time : 17:41:01=20 Current time : 17:41:02=20 Current time : 17:41:03=20 Current time : 17:41:04=20 Current time : 17:41:05=20 Current time : 17:41:06=20 Current time : 17:41:07=20 Current time : 17:41:08=20 Current time : 17:41:09=20 hang=20 Current time : 17:42:05=20 Current time : 17:42:06=20 Current time : 17:42:07=20 node1=20 -----=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] pmtud: PMTUD completed for = host: 6 link: 0 current link mtu: 1397=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] pmtud: Starting PMTUD for h= ost: 10 link: 0=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] udp: detected kernel MTU: 1= 500=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [TOTEM ] Knet pMTU change: 1397=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] pmtud: PMTUD link change fo= r host: 10 link: 0 from 469 to 1397=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] pmtud: PMTUD completed for = host: 10 link: 0 current link mtu: 1397=20 Sep 15 17:41:08 m6kvm1 corosync[18145]: [KNET ] pmtud: Global data MTU chan= ged to: 1397=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] IPC credentials authenticated= (/dev/shm/qb-18145-16239-31-zx6KJM/qb)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] connecting to client [16239]= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] connection created=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QUORUM] lib_init_fn: conn=3D0x556c= 2918d5f0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QUORUM] got quorum_type request on= 0x556c2918d5f0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QUORUM] got trackstart request on = 0x556c2918d5f0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QUORUM] sending initial status to = 0x556c2918d5f0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QUORUM] sending quorum notificatio= n to 0x556c2918d5f0, length =3D 52=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] IPC credentials authenticated= (/dev/shm/qb-18145-16239-32-I7ZZ6e/qb)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] connecting to client [16239]= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] connection created=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CMAP ] lib_init_fn: conn=3D0x556c2= 918ef20=20 Sep 15 17:41:09 m6kvm1 pmxcfs[16239]: [status] notice: update cluster info = (cluster name m6kvm, version =3D 20)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] IPC credentials authenticated= (/dev/shm/qb-18145-16239-33-6RKbvH/qb)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] connecting to client [16239]= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] connection created=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] lib_init_fn: conn=3D0x556c29= 18ad00, cpd=3D0x556c2918b50c=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] IPC credentials authenticated= (/dev/shm/qb-18145-16239-34-GAY5T9/qb)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] connecting to client [16239]= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [QB ] shm size:1048589; real_size:1= 052672; rb->word_size:263168=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] connection created=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] lib_init_fn: conn=3D0x556c29= 18c740, cpd=3D0x556c2918ce8c=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] Creating commit token beca= use I am the rep.=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] Saving state aru 5 high se= q received 5=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Storing new sequence id for= ring 1197=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] entering COMMIT state.=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] got commit token=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] entering RECOVERY state.= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] TRANS [0] member 1:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [0] member 1:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (1.1193)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 5 high delivered 5 rec= eived flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [1] member 2:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [2] member 3:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [3] member 4:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 ep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123 = received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [4] member 5:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [5] member 6:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [6] member 7:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [7] member 8:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [8] member 9:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [9] member 10:=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [10] member 11:= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [11] member 12:= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [12] member 13:= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] position [13] member 14:= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] Did not need to originate = any messages in recovery.=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] got commit token=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] Sending initial ORF token= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 0, aru 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 1, aru 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 2, aru 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 3, aru 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] retrans flag count 4 token= aru 0 install seq 0 aru 0 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] Resetting old ring state= =20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] recovery to regular 1-0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] waiting_trans_ack changed = to 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.90)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.91)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.92)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.93)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.94)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.95)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.96)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.97)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.107)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.108)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.109)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3= .94.110)=20 ep 15 17:41:09 m6kvm1 corosync[18145]: [MAIN ] Member joined: r(0) ip(10.3.= 94.111)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [SYNC ] call init for locally known= services=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] entering OPERATIONAL state= .=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [TOTEM ] A new membership (1.1197) = was formed. Members joined: 2 3 4 5 6 7 8 9 10 11 12 13 14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [SYNC ] enter sync process=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [SYNC ] Committing synchronization = for corosync configuration map access=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 2=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 3=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 4=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 5=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 6=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 7=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 8=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 9=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 10=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 11=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 12=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] got joinlist message from no= de 13=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [SYNC ] Committing synchronization = for corosync cluster closed process group service v1.01=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] my downlist: members(old:1 l= eft:0)=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[0] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.110) , pid:30209=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[1] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.110) , pid:30209=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[2] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.109) , pid:31350=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[3] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.109) , pid:31350=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[4] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.108) , pid:3569=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[5] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.108) , pid:3569=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[6] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.107) , pid:19504=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[7] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.107) , pid:19504=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[8] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.97) , pid:11947=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[9] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.97) , pid:11947=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[10] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.96) , pid:20814=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[11] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.96) , pid:20814=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[12] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.95) , pid:39420=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[13] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.95) , pid:39420=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[14] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.94) , pid:12452=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[15] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.94) , pid:12452=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[16] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.93) , pid:44300=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[17] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.93) , pid:44300=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[18] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.92) , pid:42259=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[19] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.92) , pid:42259=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[20] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.91) , pid:40630=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[21] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.91) , pid:40630=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[22] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.90) , pid:25870=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[23] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.90) , pid:25870=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[24] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.111) , pid:25634=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [CPG ] joinlist_messages[25] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.111) , pid:25634=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] flags: quorate: No Leaving= : No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote:= No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] Sending nodelist callback.= ring_id =3D 1.1197=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 13=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[13]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] total_votes=3D2, expected_= votes=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 13=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[14]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] total_votes=3D3, expected_= votes=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[1]: votes= : 1, expected: 14 flags: 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] flags: quorate: No Leaving= : No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote:= No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] total_votes=3D3, expected_= votes=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 2=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[2]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] total_votes=3D4, expected_= votes=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 2 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 2=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm1 corosync[18145]: [VOTEQ ] got nodeinfo message from = cluster node 3=20 ....=20 ....=20 next corosync restart=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [MAIN ] Node was shut down by a sig= nal=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [SERV ] Unloading all Corosync serv= ice engines.=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [QB ] withdrawing server sockets=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [QB ] qb_ipcs_unref() - destroying= =20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [SERV ] Service engine unloaded: co= rosync vote quorum service v1.0=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [QB ] qb_ipcs_disconnect(/dev/shm/q= b-18145-16239-32-I7ZZ6e/qb) state:2=20 Sep 15 17:42:03 m6kvm1 pmxcfs[16239]: [confdb] crit: cmap_dispatch failed: = 2=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [MAIN ] cs_ipcs_connection_closed()= =20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [CMAP ] exit_fn for conn=3D0x556c29= 18ef20=20 Sep 15 17:42:03 m6kvm1 corosync[18145]: [MAIN ] cs_ipcs_connection_destroye= d()=20 node2=20 -----=20 Sep 15 17:41:05 m6kvm2 corosync[25411]: [KNET ] pmtud: Starting PMTUD for h= ost: 10 link: 0=20 Sep 15 17:41:05 m6kvm2 corosync[25411]: [KNET ] udp: detected kernel MTU: 1= 500=20 Sep 15 17:41:05 m6kvm2 corosync[25411]: [KNET ] pmtud: PMTUD completed for = host: 10 link: 0 current link mtu: 1397=20 Sep 15 17:41:07 m6kvm2 corosync[25411]: [KNET ] rx: host: 1 link: 0 receive= d pong: 2=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [TOTEM ] entering GATHER state from= 11(merge during join).=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:08 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: Source host 1 not reach= able yet. Discarding packet.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] rx: host: 1 link: 0 is up= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] Knet host change callback.= nodeid: 1 reachable: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] host: host: 1 (passive) bes= t link: 0 (pri: 1)=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] pmtud: Starting PMTUD for h= ost: 1 link: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] udp: detected kernel MTU: 1= 500=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [KNET ] pmtud: PMTUD completed for = host: 1 link: 0 current link mtu: 1397=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] got commit token=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] Saving state aru 123 high = seq received 123=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [MAIN ] Storing new sequence id for= ring 1197=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] entering COMMIT state.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] got commit token=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] entering RECOVERY state.= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [0] member 2:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [1] member 3:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [2] member 4:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [3] member 5:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [4] member 6:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [5] member 7:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [6] member 8:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [7] member 9:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [8] member 10:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [9] member 11:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [10] member 12:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [11] member 13:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] TRANS [12] member 14:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [0] member 1:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (1.1193)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 5 high delivered 5 rec= eived flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [1] member 2:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [2] member 3:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [3] member 4:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [4] member 5:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [5] member 6:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [6] member 7:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [7] member 8:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [8] member 9:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [9] member 10:=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [10] member 11:= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [11] member 12:= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [12] member 13:= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] position [13] member 14:= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] previous ringid (2.1192)= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] aru 123 high delivered 123= received flag 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] Did not need to originate = any messages in recovery.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 0, aru ffffffff=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 1, aru 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 2, aru 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] token retrans flag is 0 my= set retrans flag0 retrans queue empty 1 count 3, aru 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] install seq 0 aru 0 high s= eq received 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] retrans flag count 4 token= aru 0 install seq 0 aru 0 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] Resetting old ring state= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] recovery to regular 1-0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] waiting_trans_ack changed = to 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [MAIN ] Member joined: r(0) ip(10.3= .94.89)=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [SYNC ] call init for locally known= services=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] entering OPERATIONAL state= .=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] A new membership (1.1197) = was formed. Members joined: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [SYNC ] enter sync process=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [SYNC ] Committing synchronization = for corosync configuration map access=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CMAP ] Not first sync -> no action= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 3=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 4=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 5=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 6=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 7=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 8=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 9=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 10=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 11=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 12=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] downlist left_list: 0 receiv= ed=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got joinlist message from no= de 13=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [SYNC ] Committing synchronization = for corosync cluster closed process group service v1.01=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] my downlist: members(old:13 = left:0)=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[0] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.110) , pid:30209=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[1] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.110) , pid:30209=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[2] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.109) , pid:31350=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[3] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.109) , pid:31350=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[4] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.108) , pid:3569=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[5] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.108) , pid:3569=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[6] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.107) , pid:19504=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[7] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.107) , pid:19504=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[8] group:p= ve_kvstore_v1\x00, ip:r(0) ip(10.3.94.97) , pid:11947=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[9] group:p= ve_dcdb_v1\x00, ip:r(0) ip(10.3.94.97) , pid:11947=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[10] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.96) , pid:20814=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[11] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.96) , pid:20814=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[12] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.95) , pid:39420=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[13] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.95) , pid:39420=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[14] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.94) , pid:12452=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[15] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.94) , pid:12452=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[16] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.93) , pid:44300=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[17] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.93) , pid:44300=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[18] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.92) , pid:42259=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[19] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.92) , pid:42259=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[20] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.91) , pid:40630=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[21] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.91) , pid:40630=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[22] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.90) , pid:25870=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[23] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.90) , pid:25870=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[24] group:= pve_kvstore_v1\x00, ip:r(0) ip(10.3.94.111) , pid:25634=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] joinlist_messages[25] group:= pve_dcdb_v1\x00, ip:r(0) ip(10.3.94.111) , pid:25634=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] Sending nodelist callback.= ring_id =3D 1.1197=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 13=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[13]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 13=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[14]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[1]: votes= : 1, expected: 14 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: No Leaving= : No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote:= No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] total_votes=3D14, expected= _votes=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 3 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 4 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 5 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 6 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 7 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 8 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 9 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 10 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 11 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 12 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 2 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] lowest node id: 1 us: 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] highest node id: 14 us: 2= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[2]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] total_votes=3D14, expected= _votes=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 3 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 4 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 5 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 6 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 7 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 8 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 9 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 10 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 11 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 12 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 2 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] lowest node id: 1 us: 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] highest node id: 14 us: 2= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 3=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[3]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 3=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 4=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[4]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 4=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 5=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[5]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 5=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 6=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[6]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 6=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 7=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[7]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 7=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 8=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[8]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 8=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 9=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[9]: votes= : 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 9=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 10=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[10]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 10=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 11=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[11]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 11=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 12=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[12]: vote= s: 1, expected: 14 flags: 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] flags: quorate: Yes Leavin= g: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote= : No QdeviceMasterWins: No=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] got nodeinfo message from = cluster node 12=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] nodeinfo message[0]: votes= : 0, expected: 0 flags: 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [SYNC ] Committing synchronization = for corosync vote quorum service v1.0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] total_votes=3D14, expected= _votes=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 1 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 3 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 4 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 5 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 6 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 7 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 8 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 9 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 10 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 11 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 12 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 13 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 14 state=3D1, votes= =3D1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] node 2 state=3D1, votes=3D= 1, expected=3D14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] lowest node id: 1 us: 2=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] highest node id: 14 us: 2= =20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [QUORUM] Members[14]: 1 2 3 4 5 6 7= 8 9 10 11 12 13 14=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [QUORUM] sending quorum notificatio= n to (nil), length =3D 104=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [VOTEQ ] Sending quorum callback, q= uorate =3D 1=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [MAIN ] Completed service synchroni= zation, ready to provide service.=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [TOTEM ] waiting_trans_ack changed = to 0=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got procjoin message from cl= uster node 1 (r(0) ip(10.3.94.89) ) for pid 16239=20 Sep 15 17:41:09 m6kvm2 corosync[25411]: [CPG ] got procjoin message from cl= uster node 1 (r(0) ip(10.3.94.89) ) for pid 16239=20 ----- Mail original -----=20 De: "aderumier" =20 =C3=80: "Thomas Lamprecht" =20 Cc: "Proxmox VE development discussion" =20 Envoy=C3=A9: Mardi 15 Septembre 2020 16:57:46=20 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutd= own=20 >>I mean this is bad, but also great!=20 >>Cam you do a coredump of the whole thing and upload it somewhere with the= version info=20 >>used (for dbgsym package)? That could help a lot.=20 I'll try to reproduce it again (with the full lock everywhere), and do the = coredump.=20 I have tried the real time scheduling,=20 but I still have been able to reproduce the "lrm too long" for 60s (but as = I'm restarting corosync each minute, I think it's unlocking=20 something at next corosync restart.)=20 this time it was blocked at the same time on a node in:=20 work {=20 ...=20 } elsif ($state eq 'active') {=20 ....=20 $self->update_lrm_status();=20 and another node in=20 if ($fence_request) {=20 $haenv->log('err', "node need to be fenced - releasing agent_lock\n");=20 $self->set_local_status({ state =3D> 'lost_agent_lock'});=20 } elsif (!$self->get_protected_ha_agent_lock()) {=20 $self->set_local_status({ state =3D> 'lost_agent_lock'});=20 } elsif ($self->{mode} eq 'maintenance') {=20 $self->set_local_status({ state =3D> 'maintenance'});=20 }=20 ----- Mail original -----=20 De: "Thomas Lamprecht" =20 =C3=80: "aderumier" =20 Cc: "Proxmox VE development discussion" =20 Envoy=C3=A9: Mardi 15 Septembre 2020 16:32:52=20 Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutd= own=20 On 9/15/20 4:09 PM, Alexandre DERUMIER wrote:=20 >>> Can you try to give pmxcfs real time scheduling, e.g., by doing:=20 >>>=20 >>> # systemctl edit pve-cluster=20 >>>=20 >>> And then add snippet:=20 >>>=20 >>>=20 >>> [Service]=20 >>> CPUSchedulingPolicy=3Drr=20 >>> CPUSchedulingPriority=3D99=20 > yes, sure, I'll do it now=20 >=20 >=20 >> I'm currently digging the logs=20 >>> Is your most simplest/stable reproducer still a periodic restart of cor= osync in one node?=20 > yes, a simple "systemctl restart corosync" on 1 node each minute=20 >=20 >=20 >=20 > After 1hour, it's still locked.=20 >=20 > on other nodes, I still have pmxfs logs like:=20 >=20 I mean this is bad, but also great!=20 Cam you do a coredump of the whole thing and upload it somewhere with the v= ersion info=20 used (for dbgsym package)? That could help a lot.=20 > manual "pmxcfs -d"=20 > https://gist.github.com/aderumier/4cd91d17e1f8847b93ea5f621f257c2e=20 >=20 Hmm, the fuse connection of the previous one got into a weird state (or som= ething is still=20 running) but I'd rather say this is a side-effect not directly connected to= the real bug.=20 >=20 > some interesting dmesg about "pvesr"=20 >=20 > [Tue Sep 15 14:45:34 2020] INFO: task pvesr:19038 blocked for more than 1= 20 seconds.=20 > [Tue Sep 15 14:45:34 2020] Tainted: P O 5.4.60-1-pve #1=20 > [Tue Sep 15 14:45:34 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_s= ecs" disables this message.=20 > [Tue Sep 15 14:45:34 2020] pvesr D 0 19038 1 0x00000080=20 > [Tue Sep 15 14:45:34 2020] Call Trace:=20 > [Tue Sep 15 14:45:34 2020] __schedule+0x2e6/0x6f0=20 > [Tue Sep 15 14:45:34 2020] ? filename_parentat.isra.57.part.58+0xf7/0x180= =20 > [Tue Sep 15 14:45:34 2020] schedule+0x33/0xa0=20 > [Tue Sep 15 14:45:34 2020] rwsem_down_write_slowpath+0x2ed/0x4a0=20 > [Tue Sep 15 14:45:34 2020] down_write+0x3d/0x40=20 > [Tue Sep 15 14:45:34 2020] filename_create+0x8e/0x180=20 > [Tue Sep 15 14:45:34 2020] do_mkdirat+0x59/0x110=20 > [Tue Sep 15 14:45:34 2020] __x64_sys_mkdir+0x1b/0x20=20 > [Tue Sep 15 14:45:34 2020] do_syscall_64+0x57/0x190=20 > [Tue Sep 15 14:45:34 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9=20 >=20 hmm, hangs in mkdir (cluster wide locking)=20 _______________________________________________=20 pve-devel mailing list=20 pve-devel@lists.proxmox.com=20 https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel=20 _______________________________________________=20 pve-devel mailing list=20 pve-devel@lists.proxmox.com=20 https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel=20 _______________________________________________=20 pve-devel mailing list=20 pve-devel@lists.proxmox.com=20 https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel=20