From: "Mariusz Suchodolski" <mariusz.suchodolski@suzuki.com.pl>
To: "'Proxmox VE user list'" <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Quorum Activity blocked
Date: Mon, 7 Nov 2022 09:32:01 +0100 [thread overview]
Message-ID: <011a01d8f283$69085160$3b18f420$@suzuki.com.pl> (raw)
In-Reply-To: <7523843a-2cdb-fd3a-1bb5-3423f47ee7ab@riminilug.it>
Hi Piviul,
Is the output of "pvecm status" the same on all machines?
Looks like the same issue I've had some time ago - https://forum.proxmox.com/threads/large-delay-on-pvecm-status-webui-unresponsive-node-failed-to-rejoin-cluster.96402/
MS.
-----Original Message-----
From: pve-user <pve-user-bounces@lists.proxmox.com> On Behalf Of Piviul
Sent: Monday, November 7, 2022 8:44 AM
To: pve-user@lists.proxmox.com
Subject: [PVE-User] Quorum Activity blocked
Good morning sirs, in a 3 nodes proxmox 6.4 all the 3 nodes seems to works, all vm guest continue to works but If I try to start a vm guest the starting fails with the message: "cluster not ready - no quorum?
(500)". This is the cluster manager status:
# pvecm status
Cluster information
-------------------
Name: CSA-cluster1
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Mon Nov 7 08:37:20 2022 Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2.91e
Quorate: No
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.255.2 (local)
These are the first logs in syslog showing that some problem occurs:
Nov 4 23:38:01 pve02 systemd[1]: Started Proxmox VE replication runner.
Nov 4 23:38:26 pve02 corosync[1703]: [KNET ] link: host: 3 link: 0 is down Nov 4 23:38:26 pve02 corosync[1703]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Nov 4 23:38:26 pve02 corosync[1703]: [KNET ] host: host: 3 has no active links Nov 4 23:38:28 pve02 corosync[1703]: [TOTEM ] Token has not been received in 2737 ms Nov 4 23:38:30 pve02 corosync[1703]: [KNET ] rx: host: 3 link: 0 is up Nov 4 23:38:30 pve02 corosync[1703]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Nov 4 23:38:32 pve02 corosync[1703]: [QUORUM] Sync members[2]: 1 2 Nov 4 23:38:32 pve02 corosync[1703]: [QUORUM] Sync left[1]: 3 Nov 4 23:38:32 pve02 corosync[1703]: [TOTEM ] A new membership
(1.873) was formed. Members left: 3
Nov 4 23:38:32 pve02 corosync[1703]: [TOTEM ] Failed to receive the leave message. failed: 3 Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: members: 1/1626, 2/1578 Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: starting data syncronisation Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: members: 1/1626, 2/1578 Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: starting data syncronisation Nov 4 23:38:32 pve02 corosync[1703]: [QUORUM] Members[2]: 1 2 Nov 4 23:38:32 pve02 corosync[1703]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: received sync request (epoch 1/1626/00000009) Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: received sync request (epoch 1/1626/00000009) Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: received all states Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: leader is 1/1626 Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: synced members:
1/1626, 2/1578
Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: all data is up to date Nov 4 23:38:32 pve02 pmxcfs[1578]: [dcdb] notice: dfsm_deliver_queue:
queue length 2
Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: received all states Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: all data is up to date Nov 4 23:38:32 pve02 pmxcfs[1578]: [status] notice: dfsm_deliver_queue:
queue length 46
Nov 4 23:38:34 pve02 corosync[1703]: [KNET ] link: host: 3 link: 0 is down Nov 4 23:38:34 pve02 corosync[1703]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Nov 4 23:38:34 pve02 corosync[1703]: [KNET ] host: host: 3 has no active links Nov 4 23:38:41 pve02 corosync[1703]: [KNET ] link: host: 1 link: 0 is down Nov 4 23:38:41 pve02 corosync[1703]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1) Nov 4 23:38:41 pve02 corosync[1703]: [KNET ] host: host: 1 has no active links Nov 4 23:38:42 pve02 corosync[1703]: [TOTEM ] Token has not been received in 2737 ms Nov 4 23:38:43 pve02 corosync[1703]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Nov 4 23:38:48 pve02 corosync[1703]: [QUORUM] Sync members[1]: 2 Nov 4 23:38:48 pve02 corosync[1703]: [QUORUM] Sync left[1]: 1 Nov 4 23:38:48 pve02 corosync[1703]: [TOTEM ] A new membership
(2.877) was formed. Members left: 1
Nov 4 23:38:48 pve02 corosync[1703]: [TOTEM ] Failed to receive the leave message. failed: 1 Nov 4 23:38:48 pve02 pmxcfs[1578]: [dcdb] notice: members: 2/1578 Nov 4 23:38:48 pve02 pmxcfs[1578]: [status] notice: members: 2/1578 Nov 4 23:38:48 pve02 corosync[1703]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Nov 4 23:38:48 pve02 corosync[1703]: [QUORUM] Members[1]: 2 Nov 4 23:38:48 pve02 corosync[1703]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 4 23:38:48 pve02 pmxcfs[1578]: [status] notice: node lost quorum Nov 4 23:38:48 pve02 pmxcfs[1578]: [dcdb] crit: received write while not quorate - trigger resync Nov 4 23:38:48 pve02 pmxcfs[1578]: [dcdb] crit: leaving CPG group Nov 4 23:38:48 pve02 pve-ha-lrm[1943]: unable to write lrm status file
- unable to open file '/etc/pve/nodes/pve02/lrm_status.tmp.1943' - Permission denied Nov 4 23:38:49 pve02 pmxcfs[1578]: [dcdb] notice: start cluster connection Nov 4 23:38:49 pve02 pmxcfs[1578]: [dcdb] crit: cpg_join failed: 14 Nov 4 23:38:49 pve02 pmxcfs[1578]: [dcdb] crit: can't initialize service Nov 4 23:38:55 pve02 pmxcfs[1578]: [dcdb] notice: members: 2/1578 Nov 4 23:38:55 pve02 pmxcfs[1578]: [dcdb] notice: all data is up to date Nov 4 23:39:00 pve02 systemd[1]: Starting Proxmox VE replication runner...
Nov 4 23:39:01 pve02 pvesr[2146320]: trying to acquire cfs lock 'file-replication_cfg' ...
[...]
What's happened to my cluster? Someone has some suggestions to troubleshoot the problem?
Piviul
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
next prev parent reply other threads:[~2022-11-07 8:39 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-07 7:44 Piviul
2022-11-07 8:32 ` Mariusz Suchodolski [this message]
[not found] ` <0ec3862e-69e4-4af3-b8ac-e1390d7ecd2b@binovo.es>
2022-11-07 9:28 ` Piviul
2022-11-07 9:45 ` Piviul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='011a01d8f283$69085160$3b18f420$@suzuki.com.pl' \
--to=mariusz.suchodolski@suzuki.com.pl \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.