From: "DERUMIER, Alexandre" <alexandre.derumier@groupe-cyllene.com>
To: "pve-user@lists.proxmox.com" <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Corosync and Cluster reboot
Date: Tue, 7 Jan 2025 14:15:05 +0000 [thread overview]
Message-ID: <061153a5c032dd89e04d7e3ef54b8fbcdce5fb24.camel@groupe-cyllene.com> (raw)
In-Reply-To: <CAOKSTBuFw1ihaCA7AF_iDHaSbHJXHREGLVmdPPuFEkR9L3Zjsg@mail.gmail.com>
Personnaly, I'll recommand to disable HA temporary during the network change (mv /etc/pve/ha/resources.cfg to a tmp directory, stop all pve-ha-lrm , tehn stop all pve-ha-crm to stop the watchdog)
Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.
It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)
Also, for "big" clusters (20-30 nodes), I'm using sctp protocol now, instead udp. for me , it's a lot more reliable when you have a network saturation on 1 now.
(I had the case of interne udp flood attack coming from outside on 1 on my node, lagging the whole corosync cluster).²
corosync.conf
totem {
cluster_name: ....
....
interface {
knet_transport: sctp
linknumber: 0
}
....
(This need a full restart of corosync everywhere, and HA need to be disable before, because udp can't communite with sctp, so you'll have a loss of quorum during the change)
-------- Message initial --------
De: Gilberto Ferreira <gilberto.nunes32@gmail.com<mailto:Gilberto%20Ferreira%20%3cgilberto.nunes32@gmail.com%3e>>
Répondre à: Proxmox VE user list <pve-user@lists.proxmox.com<mailto:Proxmox%20VE%20user%20list%20%3cpve-user@lists.proxmox.com%3e>>
À: Proxmox VE user list <pve-user@lists.proxmox.com<mailto:Proxmox%20VE%20user%20list%20%3cpve-user@lists.proxmox.com%3e>>
Objet: Re: [PVE-User] Corosync and Cluster reboot
Date: 07/01/2025 13:33:41
Just to clarify, I had a similar issue in a low latency network with 12
nodes cluster, all with 1G ethernet card.
After adding this token_retransmit to corosync.conf, no more problems.
Perhaps that could help you.
Em ter., 7 de jan. de 2025 às 09:01, Gilberto Ferreira <
gilberto.nunes32@gmail.com<mailto:gilberto.nunes32@gmail.com>> escreveu:
Try to add this in corosync.conf in one of the nodes: token_retransmit:
200
Em ter., 7 de jan. de 2025 às 08:24, Iztok Gregori <
iztok.gregori@elettra.eu<mailto:iztok.gregori@elettra.eu>> escreveu:
Hi to all!
I need some help to understand a situation (cluster reboot) which
happened to us previous week. We are running a 17 nodes Proxmox cluster
with a separate Ceph cluster for storage (no hyper-convergence).
We have to upgrade a stack a 2 switches and in order to avoid any
downtime we decided to prepare a new (temporary) stack and move the
links from one switch to the other. Our procedure was the following:
- Migrate all the VM from node.
- Unplug the links from the old switch.
- Plug the links to the temporary switch.
- Wait till the node is available again in the cluster.
- Repeat.
We have to move 8 nodes from one switch to the other. The first 4 nodes
went smoothly, but when we did plug the 5th node into the new switch ALL
the nodes which have configured HA VMs rebooted!
From the Corosync logs I see that the Token wasn't received and because
of that watchdog-mux wasn't updated causing the node reboot.
Here are the Corosync logs during the procedure and before the nodes
restarted. It was captured from a node which didn't reboot (pve-ha-lrm:
idle):
12:51:57 [KNET ] link: host: 18 link: 0 is down
12:51:57 [KNET ] host: host: 18 (passive) best link: 0 (pri: 1)
12:51:57 [KNET ] host: host: 18 has no active links
12:52:02 [TOTEM ] Token has not been received in 9562 ms
12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:52:16 [QUORUM] Sync left[1]: 18
12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
12:52:16 [MAIN ] Completed service synchronization, ready to provide
service.
12:52:42 [KNET ] rx: host: 18 link: 0 is up
12:52:42 [KNET ] host: host: 18 (passive) best link: 0 (pri: 1)
12:52:50 [TOTEM ] Token has not been received in 9567 ms
12:53:01 [TOTEM ] Token has not been received in 20324 ms
12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
12:53:20 [TOTEM ] Token has not been received in 9570 ms
12:53:31 [TOTEM ] Token has not been received in 20326 ms
12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
12:53:50 [TOTEM ] Token has not been received in 9570 ms
And here you can find the logs of a successfully completed "procedure":
12:19:12 [KNET ] link: host: 19 link: 0 is down
12:19:12 [KNET ] host: host: 19 (passive) best link: 0 (pri: 1)
12:19:12 [KNET ] host: host: 19 has no active links
12:19:17 [TOTEM ] Token has not been received in 9562 ms
12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 18
12:19:31 [QUORUM] Sync left[1]: 19
12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
12:19:31 [MAIN ] Completed service synchronization, ready to provide
service.
12:19:47 [KNET ] rx: host: 19 link: 0 is up
12:19:47 [KNET ] host: host: 19 (passive) best link: 0 (pri: 1)
12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 18 19
12:19:50 [QUORUM] Sync joined[1]: 19
12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined:
19
12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
19
12:19:51 [MAIN ] Completed service synchronization, ready to provide
service.
Comparing the 2 logs I can see that after the "host: 18" link was found
active again the token was not received, but I cannot figure out what
went different in this case.
I have 2 possible culprits:
1. NETWORK
The cluster network is backed up with 5 Extreme Networks switches, 3
stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one
temporary stack of two 7720-32C (100GBE). The switches are linked
together by a 2x LACP bond, and the 99% of the cluster communication are
on 100GBE.
The hosts are connected to the network with different speed interfaces:
10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All
the nodes are bonded, the Corosync network (is the same as the
management one) is defined on a bridge interface on the bonded link
(configuration is almost the same on all nodes, some older ones have
balance-xor the other have lacp as bonding mode).
Is it possible that there is something wrong with the network, but I
cannot find a probable cause. From the data that I have, I don't see
nothing special, no links were saturated, no error logged...
2. COROSYNC
The cluster is running a OLD version of Proxmox (7.1-12) with Corosync
3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a
later release. I did a quick search but I didn't found anything. The
cluster upgrade is on my to-do list (but the list is huge, so it will
not be done tomorrow).
We are running only one Corosync network which is the same as the
management/migration one, but different from the one for
client/storage/backup. The configuration is very basic, I think is the
default one, I can provide it if needed.
I checked the Corosync stats and the average latency is around 150
(microseconds?) along all links on all nodes.
====
In general it can be a combination of the 2 above or something
completely different.
Do you have some advice on where to look to debug further?
I can provide more information if needed.
Thanks a lot!
Iztok
--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=d03373de57da904eeab6763d06f4da807c2b84edd5c15564900ba03e9c92f801&u=http%3A%2F%2Fwww.elettra.eu
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com<mailto:pve-user@lists.proxmox.com>
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=99a6c22ef3cf974704323d65d59f3afc16142436b30fdba1e1bb6cd4e573af1a&u=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com<mailto:pve-user@lists.proxmox.com>
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=99a6c22ef3cf974704323d65d59f3afc16142436b30fdba1e1bb6cd4e573af1a&u=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
next prev parent reply other threads:[~2025-01-07 14:15 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 11:06 Iztok Gregori
2025-01-07 12:01 ` Gilberto Ferreira
2025-01-07 12:33 ` Gilberto Ferreira
2025-01-07 14:06 ` Iztok Gregori
2025-01-07 14:17 ` Gilberto Ferreira
2025-01-07 14:15 ` DERUMIER, Alexandre [this message]
2025-01-08 10:12 ` Iztok Gregori
2025-01-08 12:02 ` Alwin Antreich via pve-user
2025-01-08 12:53 ` proxmox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=061153a5c032dd89e04d7e3ef54b8fbcdce5fb24.camel@groupe-cyllene.com \
--to=alexandre.derumier@groupe-cyllene.com \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal