From: Iztok Gregori <iztok.gregori@elettra.eu>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: [PVE-User] Corosync and Cluster reboot
Date: Tue, 7 Jan 2025 12:06:54 +0100 [thread overview]
Message-ID: <17af1712-1aa7-4f72-bd90-1e45d1361e45@elettra.eu> (raw)
Hi to all!
I need some help to understand a situation (cluster reboot) which
happened to us previous week. We are running a 17 nodes Proxmox cluster
with a separate Ceph cluster for storage (no hyper-convergence).
We have to upgrade a stack a 2 switches and in order to avoid any
downtime we decided to prepare a new (temporary) stack and move the
links from one switch to the other. Our procedure was the following:
- Migrate all the VM from node.
- Unplug the links from the old switch.
- Plug the links to the temporary switch.
- Wait till the node is available again in the cluster.
- Repeat.
We have to move 8 nodes from one switch to the other. The first 4 nodes
went smoothly, but when we did plug the 5th node into the new switch ALL
the nodes which have configured HA VMs rebooted!
From the Corosync logs I see that the Token wasn't received and because
of that watchdog-mux wasn't updated causing the node reboot.
Here are the Corosync logs during the procedure and before the nodes
restarted. It was captured from a node which didn't reboot (pve-ha-lrm:
idle):
> 12:51:57 [KNET ] link: host: 18 link: 0 is down
> 12:51:57 [KNET ] host: host: 18 (passive) best link: 0 (pri: 1)
> 12:51:57 [KNET ] host: host: 18 has no active links
> 12:52:02 [TOTEM ] Token has not been received in 9562 ms
> 12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:52:16 [QUORUM] Sync left[1]: 18
> 12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
> 12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
> 12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:52:16 [MAIN ] Completed service synchronization, ready to provide service.
> 12:52:42 [KNET ] rx: host: 18 link: 0 is up
> 12:52:42 [KNET ] host: host: 18 (passive) best link: 0 (pri: 1)
> 12:52:50 [TOTEM ] Token has not been received in 9567 ms
> 12:53:01 [TOTEM ] Token has not been received in 20324 ms
> 12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
> 12:53:20 [TOTEM ] Token has not been received in 9570 ms
> 12:53:31 [TOTEM ] Token has not been received in 20326 ms
> 12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
> 12:53:50 [TOTEM ] Token has not been received in 9570 ms
And here you can find the logs of a successfully completed "procedure":
> 12:19:12 [KNET ] link: host: 19 link: 0 is down
> 12:19:12 [KNET ] host: host: 19 (passive) best link: 0 (pri: 1)
> 12:19:12 [KNET ] host: host: 19 has no active links
> 12:19:17 [TOTEM ] Token has not been received in 9562 ms
> 12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> 12:19:31 [QUORUM] Sync left[1]: 19
> 12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
> 12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
> 12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> 12:19:31 [MAIN ] Completed service synchronization, ready to provide service.
> 12:19:47 [KNET ] rx: host: 19 link: 0 is up
> 12:19:47 [KNET ] host: host: 19 (passive) best link: 0 (pri: 1)
> 12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19
> 12:19:50 [QUORUM] Sync joined[1]: 19
> 12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined: 19
> 12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19
> 12:19:51 [MAIN ] Completed service synchronization, ready to provide service.
Comparing the 2 logs I can see that after the "host: 18" link was found
active again the token was not received, but I cannot figure out what
went different in this case.
I have 2 possible culprits:
1. NETWORK
The cluster network is backed up with 5 Extreme Networks switches, 3
stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one
temporary stack of two 7720-32C (100GBE). The switches are linked
together by a 2x LACP bond, and the 99% of the cluster communication are
on 100GBE.
The hosts are connected to the network with different speed interfaces:
10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All
the nodes are bonded, the Corosync network (is the same as the
management one) is defined on a bridge interface on the bonded link
(configuration is almost the same on all nodes, some older ones have
balance-xor the other have lacp as bonding mode).
Is it possible that there is something wrong with the network, but I
cannot find a probable cause. From the data that I have, I don't see
nothing special, no links were saturated, no error logged...
2. COROSYNC
The cluster is running a OLD version of Proxmox (7.1-12) with Corosync
3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a
later release. I did a quick search but I didn't found anything. The
cluster upgrade is on my to-do list (but the list is huge, so it will
not be done tomorrow).
We are running only one Corosync network which is the same as the
management/migration one, but different from the one for
client/storage/backup. The configuration is very basic, I think is the
default one, I can provide it if needed.
I checked the Corosync stats and the average latency is around 150
(microseconds?) along all links on all nodes.
====
In general it can be a combination of the 2 above or something
completely different.
Do you have some advice on where to look to debug further?
I can provide more information if needed.
Thanks a lot!
Iztok
--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
http://www.elettra.eu
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
next reply other threads:[~2025-01-07 11:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 11:06 Iztok Gregori [this message]
2025-01-07 12:01 ` Gilberto Ferreira
2025-01-07 12:33 ` Gilberto Ferreira
2025-01-07 14:06 ` Iztok Gregori
2025-01-07 14:17 ` Gilberto Ferreira
2025-01-07 14:15 ` DERUMIER, Alexandre
2025-01-08 10:12 ` Iztok Gregori
2025-01-08 12:02 ` Alwin Antreich via pve-user
2025-01-08 12:53 ` proxmox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17af1712-1aa7-4f72-bd90-1e45d1361e45@elettra.eu \
--to=iztok.gregori@elettra.eu \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox