From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id E264E8CF59 for ; Mon, 7 Nov 2022 08:52:19 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C5856274CD for ; Mon, 7 Nov 2022 08:51:49 +0100 (CET) Received: from 5.mo560.mail-out.ovh.net (5.mo560.mail-out.ovh.net [87.98.181.248]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 7 Nov 2022 08:51:48 +0100 (CET) Received: from player737.ha.ovh.net (unknown [10.110.208.131]) by mo560.mail-out.ovh.net (Postfix) with ESMTP id 4FE3A25A8E for ; Mon, 7 Nov 2022 07:44:05 +0000 (UTC) Received: from riminilug.it (host-79-6-131-246.business.telecomitalia.it [79.6.131.246]) (Authenticated sender: piviul@riminilug.it) by player737.ha.ovh.net (Postfix) with ESMTPSA id BEA7F26FC7BAB for ; Mon, 7 Nov 2022 07:44:04 +0000 (UTC) Authentication-Results: garm.ovh; auth=pass (GARM-102R00453c313c9-f0e2-47b6-8f68-cf525da88644, D2CAA6F38DC51FD01D751A62514153C2852E07C9) smtp.auth=piviul@riminilug.it X-OVh-ClientIp: 79.6.131.246 Message-ID: <7523843a-2cdb-fd3a-1bb5-3423f47ee7ab@riminilug.it> Date: Mon, 7 Nov 2022 08:44:03 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 To: pve-user@lists.proxmox.com Content-Language: it, en-US From: Piviul Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Ovh-Tracer-Id: 13205680011633985953 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvgedrvdejgdduudduucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdpvefjgfevmfevgfenuceurghilhhouhhtmecuhedttdenucenucfjughrpefkffggfgfvhffutgfgsehtkeertddtfeejnecuhfhrohhmpefrihhvihhulhcuoehpihhvihhulhesrhhimhhinhhilhhughdrihhtqeenucggtffrrghtthgvrhhnpeejgeekieeffedutdeivdetudeuhfeiveetveehgfeifeeluddukeeiledtteeuveenucfkphepuddvjedrtddrtddruddpjeelrdeirddufedurddvgeeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepuddvjedrtddrtddruddpmhgrihhlfhhrohhmpeeophhivhhiuhhlsehrihhmihhnihhluhhgrdhitheqpdhnsggprhgtphhtthhopedupdhrtghpthhtohepphhvvgdquhhsvghrsehlihhsthhsrdhprhhogihmohigrdgtohhmpdfovfetjfhoshhtpehmohehiedtpdhmohguvgepshhmthhpohhuth X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust RCVD_IN_MSPIKE_H3 -0.01 Good reputation (+3) RCVD_IN_MSPIKE_WL -0.01 Mailspike good senders SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [PVE-User] Quorum Activity blocked X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Nov 2022 07:52:19 -0000 Good morning sirs, in a 3 nodes proxmox 6.4 all the 3 nodes seems to works, all vm guest continue to works but If I try to start a vm guest the starting fails with the message: "cluster not ready - no quorum? (500)". This is the cluster manager status: # pvecm status Cluster information ------------------- Name:             CSA-cluster1 Config Version:   3 Transport:        knet Secure auth:      on Quorum information ------------------ Date:             Mon Nov  7 08:37:20 2022 Quorum provider:  corosync_votequorum Nodes:            1 Node ID:          0x00000002 Ring ID:          2.91e Quorate:          No Votequorum information ---------------------- Expected votes:   3 Highest expected: 3 Total votes:      1 Quorum:           2 Activity blocked Flags: Membership information ----------------------     Nodeid      Votes Name 0x00000002          1 192.168.255.2 (local) These are the first logs in syslog showing that some problem occurs: Nov  4 23:38:01 pve02 systemd[1]: Started Proxmox VE replication runner. Nov  4 23:38:26 pve02 corosync[1703]:   [KNET  ] link: host: 3 link: 0 is down Nov  4 23:38:26 pve02 corosync[1703]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1) Nov  4 23:38:26 pve02 corosync[1703]:   [KNET  ] host: host: 3 has no active links Nov  4 23:38:28 pve02 corosync[1703]:   [TOTEM ] Token has not been received in 2737 ms Nov  4 23:38:30 pve02 corosync[1703]:   [KNET  ] rx: host: 3 link: 0 is up Nov  4 23:38:30 pve02 corosync[1703]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1) Nov  4 23:38:32 pve02 corosync[1703]:   [QUORUM] Sync members[2]: 1 2 Nov  4 23:38:32 pve02 corosync[1703]:   [QUORUM] Sync left[1]: 3 Nov  4 23:38:32 pve02 corosync[1703]:   [TOTEM ] A new membership (1.873) was formed. Members left: 3 Nov  4 23:38:32 pve02 corosync[1703]:   [TOTEM ] Failed to receive the leave message. failed: 3 Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: members: 1/1626, 2/1578 Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: starting data syncronisation Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: members: 1/1626, 2/1578 Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: starting data syncronisation Nov  4 23:38:32 pve02 corosync[1703]:   [QUORUM] Members[2]: 1 2 Nov  4 23:38:32 pve02 corosync[1703]:   [MAIN  ] Completed service synchronization, ready to provide service. Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: received sync request (epoch 1/1626/00000009) Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: received sync request (epoch 1/1626/00000009) Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: received all states Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: leader is 1/1626 Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: synced members: 1/1626, 2/1578 Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: all data is up to date Nov  4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: dfsm_deliver_queue: queue length 2 Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: received all states Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: all data is up to date Nov  4 23:38:32 pve02 pmxcfs[1578]: [status] notice: dfsm_deliver_queue: queue length 46 Nov  4 23:38:34 pve02 corosync[1703]:   [KNET  ] link: host: 3 link: 0 is down Nov  4 23:38:34 pve02 corosync[1703]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1) Nov  4 23:38:34 pve02 corosync[1703]:   [KNET  ] host: host: 3 has no active links Nov  4 23:38:41 pve02 corosync[1703]:   [KNET  ] link: host: 1 link: 0 is down Nov  4 23:38:41 pve02 corosync[1703]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1) Nov  4 23:38:41 pve02 corosync[1703]:   [KNET  ] host: host: 1 has no active links Nov  4 23:38:42 pve02 corosync[1703]:   [TOTEM ] Token has not been received in 2737 ms Nov  4 23:38:43 pve02 corosync[1703]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus. Nov  4 23:38:48 pve02 corosync[1703]:   [QUORUM] Sync members[1]: 2 Nov  4 23:38:48 pve02 corosync[1703]:   [QUORUM] Sync left[1]: 1 Nov  4 23:38:48 pve02 corosync[1703]:   [TOTEM ] A new membership (2.877) was formed. Members left: 1 Nov  4 23:38:48 pve02 corosync[1703]:   [TOTEM ] Failed to receive the leave message. failed: 1 Nov  4 23:38:48 pve02 pmxcfs[1578]: [dcdb] notice: members: 2/1578 Nov  4 23:38:48 pve02 pmxcfs[1578]: [status] notice: members: 2/1578 Nov  4 23:38:48 pve02 corosync[1703]:   [QUORUM] This node is within the non-primary component and will NOT provide any services. Nov  4 23:38:48 pve02 corosync[1703]:   [QUORUM] Members[1]: 2 Nov  4 23:38:48 pve02 corosync[1703]:   [MAIN  ] Completed service synchronization, ready to provide service. Nov  4 23:38:48 pve02 pmxcfs[1578]: [status] notice: node lost quorum Nov  4 23:38:48 pve02 pmxcfs[1578]: [dcdb] crit: received write while not quorate - trigger resync Nov  4 23:38:48 pve02 pmxcfs[1578]: [dcdb] crit: leaving CPG group Nov  4 23:38:48 pve02 pve-ha-lrm[1943]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve02/lrm_status.tmp.1943' - Permission denied Nov  4 23:38:49 pve02 pmxcfs[1578]: [dcdb] notice: start cluster connection Nov  4 23:38:49 pve02 pmxcfs[1578]: [dcdb] crit: cpg_join failed: 14 Nov  4 23:38:49 pve02 pmxcfs[1578]: [dcdb] crit: can't initialize service Nov  4 23:38:55 pve02 pmxcfs[1578]: [dcdb] notice: members: 2/1578 Nov  4 23:38:55 pve02 pmxcfs[1578]: [dcdb] notice: all data is up to date Nov  4 23:39:00 pve02 systemd[1]: Starting Proxmox VE replication runner... Nov  4 23:39:01 pve02 pvesr[2146320]: trying to acquire cfs lock 'file-replication_cfg' ... [...] What's happened to my cluster? Someone has some suggestions to troubleshoot the problem? Piviul