public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] Corosync and Cluster reboot
@ 2025-01-07 11:06 Iztok Gregori
  2025-01-07 12:01 ` Gilberto Ferreira
  0 siblings, 1 reply; 9+ messages in thread
From: Iztok Gregori @ 2025-01-07 11:06 UTC (permalink / raw)
  To: Proxmox VE user list

Hi to all!

I need some help to understand a situation (cluster reboot) which 
happened to us previous week. We are running a 17 nodes Proxmox cluster 
with a separate Ceph cluster for storage (no hyper-convergence).

We have to upgrade a stack a 2 switches and in order to avoid any 
downtime we decided to prepare a new (temporary) stack and move the 
links from one switch to the other. Our procedure was the following:

- Migrate all the VM from node.
- Unplug the links from the old switch.
- Plug the links to the temporary switch.
- Wait till the node is available again in the cluster.
- Repeat.

We have to move 8 nodes from one switch to the other. The first 4 nodes 
went smoothly, but when we did plug the 5th node into the new switch ALL 
the nodes which have configured HA VMs rebooted!

 From the Corosync logs I see that the Token wasn't received and because 
of that watchdog-mux wasn't updated causing the node reboot.

Here are the Corosync logs during the procedure and before the nodes 
restarted. It was captured from a node which didn't reboot (pve-ha-lrm: 
idle):

> 12:51:57 [KNET  ] link: host: 18 link: 0 is down
> 12:51:57 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
> 12:51:57 [KNET  ] host: host: 18 has no active links
> 12:52:02 [TOTEM ] Token has not been received in 9562 ms
> 12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:52:16 [QUORUM] Sync left[1]: 18
> 12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
> 12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
> 12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:52:16 [MAIN  ] Completed service synchronization, ready to provide service.
> 12:52:42 [KNET  ] rx: host: 18 link: 0 is up
> 12:52:42 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
> 12:52:50 [TOTEM ] Token has not been received in 9567 ms
> 12:53:01 [TOTEM ] Token has not been received in 20324 ms
> 12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
> 12:53:20 [TOTEM ] Token has not been received in 9570 ms
> 12:53:31 [TOTEM ] Token has not been received in 20326 ms
> 12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> 12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
> 12:53:50 [TOTEM ] Token has not been received in 9570 ms 

And here you can find the logs of a successfully completed "procedure":

> 12:19:12 [KNET  ] link: host: 19 link: 0 is down
> 12:19:12 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
> 12:19:12 [KNET  ] host: host: 19 has no active links
> 12:19:17 [TOTEM ] Token has not been received in 9562 ms
> 12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> 12:19:31 [QUORUM] Sync left[1]: 19
> 12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
> 12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
> 12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> 12:19:31 [MAIN  ] Completed service synchronization, ready to provide service.
> 12:19:47 [KNET  ] rx: host: 19 link: 0 is up
> 12:19:47 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
> 12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19
> 12:19:50 [QUORUM] Sync joined[1]: 19
> 12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined: 19
> 12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18 19
> 12:19:51 [MAIN  ] Completed service synchronization, ready to provide service. 

Comparing the 2 logs I can see that after the "host: 18" link was found 
active again the token was not received, but I cannot figure out what 
went different in this case.

I have 2 possible culprits:

1. NETWORK

The cluster network is backed up with 5 Extreme Networks switches, 3 
stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one 
temporary stack of two 7720-32C (100GBE). The switches are linked 
together by a 2x LACP bond, and the 99% of the cluster communication are 
on 100GBE.

The hosts are connected to the network with different speed interfaces: 
10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All 
the nodes are bonded, the Corosync network (is the same as the 
management one) is defined on a bridge interface on the bonded link 
(configuration is almost the same on all nodes, some older ones have 
balance-xor the other have lacp as bonding mode).

Is it possible that there is something wrong with the network, but I 
cannot find a probable cause. From the data that I have, I don't see 
nothing special, no links were saturated, no error logged...

2. COROSYNC

The cluster is running a OLD version of Proxmox (7.1-12) with Corosync 
3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a 
later release. I did a quick search but I didn't found anything. The 
cluster upgrade is on my to-do list (but the list is huge, so it will 
not be done tomorrow).

We are running only one Corosync network which is the same as the 
management/migration one, but different from the one for 
client/storage/backup. The configuration is very basic, I think is the 
default one, I can provide it if needed.

I checked the Corosync stats and the average latency is around 150 
(microseconds?) along all links on all nodes.

====

In general it can be a combination of the 2 above or something 
completely different.

Do you have some advice on where to look to debug further?

I can provide more information if needed.

Thanks a lot!

Iztok



-- 
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
http://www.elettra.eu

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 11:06 [PVE-User] Corosync and Cluster reboot Iztok Gregori
@ 2025-01-07 12:01 ` Gilberto Ferreira
  2025-01-07 12:33   ` Gilberto Ferreira
  0 siblings, 1 reply; 9+ messages in thread
From: Gilberto Ferreira @ 2025-01-07 12:01 UTC (permalink / raw)
  To: Proxmox VE user list

Try to add this in corosync.conf in one of the nodes:  token_retransmit: 200







Em ter., 7 de jan. de 2025 às 08:24, Iztok Gregori <iztok.gregori@elettra.eu>
escreveu:

> Hi to all!
>
> I need some help to understand a situation (cluster reboot) which
> happened to us previous week. We are running a 17 nodes Proxmox cluster
> with a separate Ceph cluster for storage (no hyper-convergence).
>
> We have to upgrade a stack a 2 switches and in order to avoid any
> downtime we decided to prepare a new (temporary) stack and move the
> links from one switch to the other. Our procedure was the following:
>
> - Migrate all the VM from node.
> - Unplug the links from the old switch.
> - Plug the links to the temporary switch.
> - Wait till the node is available again in the cluster.
> - Repeat.
>
> We have to move 8 nodes from one switch to the other. The first 4 nodes
> went smoothly, but when we did plug the 5th node into the new switch ALL
> the nodes which have configured HA VMs rebooted!
>
>  From the Corosync logs I see that the Token wasn't received and because
> of that watchdog-mux wasn't updated causing the node reboot.
>
> Here are the Corosync logs during the procedure and before the nodes
> restarted. It was captured from a node which didn't reboot (pve-ha-lrm:
> idle):
>
> > 12:51:57 [KNET  ] link: host: 18 link: 0 is down
> > 12:51:57 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
> > 12:51:57 [KNET  ] host: host: 18 has no active links
> > 12:52:02 [TOTEM ] Token has not been received in 9562 ms
> > 12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
> 17 19
> > 12:52:16 [QUORUM] Sync left[1]: 18
> > 12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
> > 12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
> > 12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
> > 12:52:16 [MAIN  ] Completed service synchronization, ready to provide
> service.
> > 12:52:42 [KNET  ] rx: host: 18 link: 0 is up
> > 12:52:42 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
> > 12:52:50 [TOTEM ] Token has not been received in 9567 ms
> > 12:53:01 [TOTEM ] Token has not been received in 20324 ms
> > 12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
> 17 19
> > 12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
> > 12:53:20 [TOTEM ] Token has not been received in 9570 ms
> > 12:53:31 [TOTEM ] Token has not been received in 20326 ms
> > 12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
> 17 19
> > 12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
> > 12:53:50 [TOTEM ] Token has not been received in 9570 ms
>
> And here you can find the logs of a successfully completed "procedure":
>
> > 12:19:12 [KNET  ] link: host: 19 link: 0 is down
> > 12:19:12 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
> > 12:19:12 [KNET  ] host: host: 19 has no active links
> > 12:19:17 [TOTEM ] Token has not been received in 9562 ms
> > 12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
> 17 18
> > 12:19:31 [QUORUM] Sync left[1]: 19
> > 12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
> > 12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
> > 12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> > 12:19:31 [MAIN  ] Completed service synchronization, ready to provide
> service.
> > 12:19:47 [KNET  ] rx: host: 19 link: 0 is up
> > 12:19:47 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
> > 12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
> 17 18 19
> > 12:19:50 [QUORUM] Sync joined[1]: 19
> > 12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined: 19
> > 12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
> 19
> > 12:19:51 [MAIN  ] Completed service synchronization, ready to provide
> service.
>
> Comparing the 2 logs I can see that after the "host: 18" link was found
> active again the token was not received, but I cannot figure out what
> went different in this case.
>
> I have 2 possible culprits:
>
> 1. NETWORK
>
> The cluster network is backed up with 5 Extreme Networks switches, 3
> stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one
> temporary stack of two 7720-32C (100GBE). The switches are linked
> together by a 2x LACP bond, and the 99% of the cluster communication are
> on 100GBE.
>
> The hosts are connected to the network with different speed interfaces:
> 10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All
> the nodes are bonded, the Corosync network (is the same as the
> management one) is defined on a bridge interface on the bonded link
> (configuration is almost the same on all nodes, some older ones have
> balance-xor the other have lacp as bonding mode).
>
> Is it possible that there is something wrong with the network, but I
> cannot find a probable cause. From the data that I have, I don't see
> nothing special, no links were saturated, no error logged...
>
> 2. COROSYNC
>
> The cluster is running a OLD version of Proxmox (7.1-12) with Corosync
> 3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a
> later release. I did a quick search but I didn't found anything. The
> cluster upgrade is on my to-do list (but the list is huge, so it will
> not be done tomorrow).
>
> We are running only one Corosync network which is the same as the
> management/migration one, but different from the one for
> client/storage/backup. The configuration is very basic, I think is the
> default one, I can provide it if needed.
>
> I checked the Corosync stats and the average latency is around 150
> (microseconds?) along all links on all nodes.
>
> ====
>
> In general it can be a combination of the 2 above or something
> completely different.
>
> Do you have some advice on where to look to debug further?
>
> I can provide more information if needed.
>
> Thanks a lot!
>
> Iztok
>
>
>
> --
> Iztok Gregori
> ICT Systems and Services
> Elettra - Sincrotrone Trieste S.C.p.A.
> Telephone: +39 040 3758948
> http://www.elettra.eu
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 12:01 ` Gilberto Ferreira
@ 2025-01-07 12:33   ` Gilberto Ferreira
  2025-01-07 14:06     ` Iztok Gregori
  2025-01-07 14:15     ` DERUMIER, Alexandre
  0 siblings, 2 replies; 9+ messages in thread
From: Gilberto Ferreira @ 2025-01-07 12:33 UTC (permalink / raw)
  To: Proxmox VE user list

Just to clarify, I had a similar issue in a low latency network with 12
nodes cluster, all with 1G ethernet card.
After adding this token_retransmit to corosync.conf, no more problems.
Perhaps that could help you.






Em ter., 7 de jan. de 2025 às 09:01, Gilberto Ferreira <
gilberto.nunes32@gmail.com> escreveu:

> Try to add this in corosync.conf in one of the nodes:  token_retransmit:
> 200
>
>
>
>
>
>
>
> Em ter., 7 de jan. de 2025 às 08:24, Iztok Gregori <
> iztok.gregori@elettra.eu> escreveu:
>
>> Hi to all!
>>
>> I need some help to understand a situation (cluster reboot) which
>> happened to us previous week. We are running a 17 nodes Proxmox cluster
>> with a separate Ceph cluster for storage (no hyper-convergence).
>>
>> We have to upgrade a stack a 2 switches and in order to avoid any
>> downtime we decided to prepare a new (temporary) stack and move the
>> links from one switch to the other. Our procedure was the following:
>>
>> - Migrate all the VM from node.
>> - Unplug the links from the old switch.
>> - Plug the links to the temporary switch.
>> - Wait till the node is available again in the cluster.
>> - Repeat.
>>
>> We have to move 8 nodes from one switch to the other. The first 4 nodes
>> went smoothly, but when we did plug the 5th node into the new switch ALL
>> the nodes which have configured HA VMs rebooted!
>>
>>  From the Corosync logs I see that the Token wasn't received and because
>> of that watchdog-mux wasn't updated causing the node reboot.
>>
>> Here are the Corosync logs during the procedure and before the nodes
>> restarted. It was captured from a node which didn't reboot (pve-ha-lrm:
>> idle):
>>
>> > 12:51:57 [KNET  ] link: host: 18 link: 0 is down
>> > 12:51:57 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
>> > 12:51:57 [KNET  ] host: host: 18 has no active links
>> > 12:52:02 [TOTEM ] Token has not been received in 9562 ms
>> > 12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
>> 17 19
>> > 12:52:16 [QUORUM] Sync left[1]: 18
>> > 12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
>> > 12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
>> > 12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
>> > 12:52:16 [MAIN  ] Completed service synchronization, ready to provide
>> service.
>> > 12:52:42 [KNET  ] rx: host: 18 link: 0 is up
>> > 12:52:42 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
>> > 12:52:50 [TOTEM ] Token has not been received in 9567 ms
>> > 12:53:01 [TOTEM ] Token has not been received in 20324 ms
>> > 12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
>> 17 19
>> > 12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
>> > 12:53:20 [TOTEM ] Token has not been received in 9570 ms
>> > 12:53:31 [TOTEM ] Token has not been received in 20326 ms
>> > 12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
>> 17 19
>> > 12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
>> > 12:53:50 [TOTEM ] Token has not been received in 9570 ms
>>
>> And here you can find the logs of a successfully completed "procedure":
>>
>> > 12:19:12 [KNET  ] link: host: 19 link: 0 is down
>> > 12:19:12 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
>> > 12:19:12 [KNET  ] host: host: 19 has no active links
>> > 12:19:17 [TOTEM ] Token has not been received in 9562 ms
>> > 12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
>> 17 18
>> > 12:19:31 [QUORUM] Sync left[1]: 19
>> > 12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
>> > 12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
>> > 12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
>> > 12:19:31 [MAIN  ] Completed service synchronization, ready to provide
>> service.
>> > 12:19:47 [KNET  ] rx: host: 19 link: 0 is up
>> > 12:19:47 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
>> > 12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
>> 17 18 19
>> > 12:19:50 [QUORUM] Sync joined[1]: 19
>> > 12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined:
>> 19
>> > 12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
>> 19
>> > 12:19:51 [MAIN  ] Completed service synchronization, ready to provide
>> service.
>>
>> Comparing the 2 logs I can see that after the "host: 18" link was found
>> active again the token was not received, but I cannot figure out what
>> went different in this case.
>>
>> I have 2 possible culprits:
>>
>> 1. NETWORK
>>
>> The cluster network is backed up with 5 Extreme Networks switches, 3
>> stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one
>> temporary stack of two 7720-32C (100GBE). The switches are linked
>> together by a 2x LACP bond, and the 99% of the cluster communication are
>> on 100GBE.
>>
>> The hosts are connected to the network with different speed interfaces:
>> 10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All
>> the nodes are bonded, the Corosync network (is the same as the
>> management one) is defined on a bridge interface on the bonded link
>> (configuration is almost the same on all nodes, some older ones have
>> balance-xor the other have lacp as bonding mode).
>>
>> Is it possible that there is something wrong with the network, but I
>> cannot find a probable cause. From the data that I have, I don't see
>> nothing special, no links were saturated, no error logged...
>>
>> 2. COROSYNC
>>
>> The cluster is running a OLD version of Proxmox (7.1-12) with Corosync
>> 3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a
>> later release. I did a quick search but I didn't found anything. The
>> cluster upgrade is on my to-do list (but the list is huge, so it will
>> not be done tomorrow).
>>
>> We are running only one Corosync network which is the same as the
>> management/migration one, but different from the one for
>> client/storage/backup. The configuration is very basic, I think is the
>> default one, I can provide it if needed.
>>
>> I checked the Corosync stats and the average latency is around 150
>> (microseconds?) along all links on all nodes.
>>
>> ====
>>
>> In general it can be a combination of the 2 above or something
>> completely different.
>>
>> Do you have some advice on where to look to debug further?
>>
>> I can provide more information if needed.
>>
>> Thanks a lot!
>>
>> Iztok
>>
>>
>>
>> --
>> Iztok Gregori
>> ICT Systems and Services
>> Elettra - Sincrotrone Trieste S.C.p.A.
>> Telephone: +39 040 3758948
>> http://www.elettra.eu
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>>
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 12:33   ` Gilberto Ferreira
@ 2025-01-07 14:06     ` Iztok Gregori
  2025-01-07 14:17       ` Gilberto Ferreira
  2025-01-07 14:15     ` DERUMIER, Alexandre
  1 sibling, 1 reply; 9+ messages in thread
From: Iztok Gregori @ 2025-01-07 14:06 UTC (permalink / raw)
  To: pve-user

On 07/01/25 13:33, Gilberto Ferreira wrote:
> Just to clarify, I had a similar issue in a low latency network with 12
> nodes cluster, all with 1G ethernet card.
> After adding this token_retransmit to corosync.conf, no more problems.
> Perhaps that could help you.
> 
> 
> 
> 
> 
> 
> Em ter., 7 de jan. de 2025 às 09:01, Gilberto Ferreira <
> gilberto.nunes32@gmail.com> escreveu:
> 
>> Try to add this in corosync.conf in one of the nodes:  token_retransmit:
>> 200
>>

Hi and thank you for your suggestion!

Right now this is the configuration of the token in our cluster:

> root@aaa:~# corosync-cmapctl | grep token
> runtime.config.totem.token (u32) = 12750
> runtime.config.totem.token_retransmit (u32) = 3035
> runtime.config.totem.token_retransmits_before_loss_const (u32) = 4

The token_retransmit is calculated automatically by the formula

token / (token_retransmits_before_loss_const + 0.2)

and in our case is 3035 miliseconds. I have 2 questions:

1. Because the value is automatically calculated is better to increase 
the value "retransmits_before_loss_const" (or decrease the "token" 
value) than to set a constant (200 milliseconds for example)?
2. If I alter this value and in the future a node will be added/removed, 
the value will remain the same or it will be recalculated?


Thanks
Iztok

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 12:33   ` Gilberto Ferreira
  2025-01-07 14:06     ` Iztok Gregori
@ 2025-01-07 14:15     ` DERUMIER, Alexandre
  2025-01-08 10:12       ` Iztok Gregori
  1 sibling, 1 reply; 9+ messages in thread
From: DERUMIER, Alexandre @ 2025-01-07 14:15 UTC (permalink / raw)
  To: pve-user

Personnaly, I'll recommand to disable HA  temporary during the network change  (mv /etc/pve/ha/resources.cfg  to a tmp directory,  stop all pve-ha-lrm   , tehn stop all pve-ha-crm   to stop  the watchdog)

Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.



It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)


Also, for "big" clusters (20-30 nodes), I'm using sctp protocol now, instead udp. for me , it's a lot more reliable when you have a network saturation on 1 now.

(I had the case of interne  udp flood attack coming from outside on 1 on my node, lagging the whole corosync cluster).²


corosync.conf

totem {
   cluster_name: ....
   ....
  interface {
      knet_transport: sctp
      linknumber: 0
  }
  ....


(This need a full restart of corosync everywhere, and HA need to be disable before, because udp can't communite with sctp, so you'll have a loss of quorum during the change)




-------- Message initial --------
De: Gilberto Ferreira <gilberto.nunes32@gmail.com<mailto:Gilberto%20Ferreira%20%3cgilberto.nunes32@gmail.com%3e>>
Répondre à: Proxmox VE user list <pve-user@lists.proxmox.com<mailto:Proxmox%20VE%20user%20list%20%3cpve-user@lists.proxmox.com%3e>>
À: Proxmox VE user list <pve-user@lists.proxmox.com<mailto:Proxmox%20VE%20user%20list%20%3cpve-user@lists.proxmox.com%3e>>
Objet: Re: [PVE-User] Corosync and Cluster reboot
Date: 07/01/2025 13:33:41

Just to clarify, I had a similar issue in a low latency network with 12
nodes cluster, all with 1G ethernet card.
After adding this token_retransmit to corosync.conf, no more problems.
Perhaps that could help you.






Em ter., 7 de jan. de 2025 às 09:01, Gilberto Ferreira <
gilberto.nunes32@gmail.com<mailto:gilberto.nunes32@gmail.com>> escreveu:

Try to add this in corosync.conf in one of the nodes:  token_retransmit:
200







Em ter., 7 de jan. de 2025 às 08:24, Iztok Gregori <
iztok.gregori@elettra.eu<mailto:iztok.gregori@elettra.eu>> escreveu:

Hi to all!

I need some help to understand a situation (cluster reboot) which
happened to us previous week. We are running a 17 nodes Proxmox cluster
with a separate Ceph cluster for storage (no hyper-convergence).

We have to upgrade a stack a 2 switches and in order to avoid any
downtime we decided to prepare a new (temporary) stack and move the
links from one switch to the other. Our procedure was the following:

- Migrate all the VM from node.
- Unplug the links from the old switch.
- Plug the links to the temporary switch.
- Wait till the node is available again in the cluster.
- Repeat.

We have to move 8 nodes from one switch to the other. The first 4 nodes
went smoothly, but when we did plug the 5th node into the new switch ALL
the nodes which have configured HA VMs rebooted!

 From the Corosync logs I see that the Token wasn't received and because
of that watchdog-mux wasn't updated causing the node reboot.

Here are the Corosync logs during the procedure and before the nodes
restarted. It was captured from a node which didn't reboot (pve-ha-lrm:
idle):

12:51:57 [KNET  ] link: host: 18 link: 0 is down
12:51:57 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
12:51:57 [KNET  ] host: host: 18 has no active links
12:52:02 [TOTEM ] Token has not been received in 9562 ms
12:52:16 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:52:16 [QUORUM] Sync left[1]: 18
12:52:16 [TOTEM ] A new membership (1.d29) was formed. Members left: 18
12:52:16 [TOTEM ] Failed to receive the leave message. failed: 18
12:52:16 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 19
12:52:16 [MAIN  ] Completed service synchronization, ready to provide
service.
12:52:42 [KNET  ] rx: host: 18 link: 0 is up
12:52:42 [KNET  ] host: host: 18 (passive) best link: 0 (pri: 1)
12:52:50 [TOTEM ] Token has not been received in 9567 ms
12:53:01 [TOTEM ] Token has not been received in 20324 ms
12:53:11 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:53:11 [TOTEM ] A new membership (1.d35) was formed. Members
12:53:20 [TOTEM ] Token has not been received in 9570 ms
12:53:31 [TOTEM ] Token has not been received in 20326 ms
12:53:41 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 19
12:53:41 [TOTEM ] A new membership (1.d41) was formed. Members
12:53:50 [TOTEM ] Token has not been received in 9570 ms

And here you can find the logs of a successfully completed "procedure":

12:19:12 [KNET  ] link: host: 19 link: 0 is down
12:19:12 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
12:19:12 [KNET  ] host: host: 19 has no active links
12:19:17 [TOTEM ] Token has not been received in 9562 ms
12:19:31 [QUORUM] Sync members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 18
12:19:31 [QUORUM] Sync left[1]: 19
12:19:31 [TOTEM ] A new membership (1.d21) was formed. Members left: 19
12:19:31 [TOTEM ] Failed to receive the leave message. failed: 19
12:19:31 [QUORUM] Members[16]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
12:19:31 [MAIN  ] Completed service synchronization, ready to provide
service.
12:19:47 [KNET  ] rx: host: 19 link: 0 is up
12:19:47 [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
12:19:50 [QUORUM] Sync members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16
17 18 19
12:19:50 [QUORUM] Sync joined[1]: 19
12:19:50 [TOTEM ] A new membership (1.d25) was formed. Members joined:
19
12:19:51 [QUORUM] Members[17]: 1 2 3 4 7 8 9 10 11 12 13 14 15 16 17 18
19
12:19:51 [MAIN  ] Completed service synchronization, ready to provide
service.

Comparing the 2 logs I can see that after the "host: 18" link was found
active again the token was not received, but I cannot figure out what
went different in this case.

I have 2 possible culprits:

1. NETWORK

The cluster network is backed up with 5 Extreme Networks switches, 3
stacks of two x870 (100GBE), 1 stack of two x770 (40GBE) and one
temporary stack of two 7720-32C (100GBE). The switches are linked
together by a 2x LACP bond, and the 99% of the cluster communication are
on 100GBE.

The hosts are connected to the network with different speed interfaces:
10GBE (1 node), 25GBE (4 nodes), 40GBE (1 node), 100GBE (11 nodes). All
the nodes are bonded, the Corosync network (is the same as the
management one) is defined on a bridge interface on the bonded link
(configuration is almost the same on all nodes, some older ones have
balance-xor the other have lacp as bonding mode).

Is it possible that there is something wrong with the network, but I
cannot find a probable cause. From the data that I have, I don't see
nothing special, no links were saturated, no error logged...

2. COROSYNC

The cluster is running a OLD version of Proxmox (7.1-12) with Corosync
3.1.5-pve2. Is possible that there is a problem in Corosync fixed in a
later release. I did a quick search but I didn't found anything. The
cluster upgrade is on my to-do list (but the list is huge, so it will
not be done tomorrow).

We are running only one Corosync network which is the same as the
management/migration one, but different from the one for
client/storage/backup. The configuration is very basic, I think is the
default one, I can provide it if needed.

I checked the Corosync stats and the average latency is around 150
(microseconds?) along all links on all nodes.

====

In general it can be a combination of the 2 above or something
completely different.

Do you have some advice on where to look to debug further?

I can provide more information if needed.

Thanks a lot!

Iztok



--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=d03373de57da904eeab6763d06f4da807c2b84edd5c15564900ba03e9c92f801&u=http%3A%2F%2Fwww.elettra.eu

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com<mailto:pve-user@lists.proxmox.com>
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=99a6c22ef3cf974704323d65d59f3afc16142436b30fdba1e1bb6cd4e573af1a&u=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com<mailto:pve-user@lists.proxmox.com>
https://antiphishing.vadesecure.com/v4?f=dm5IcURvcWhWR2JSb0duRJW0URV3OQ1N5DTDN5WecuZPqfd57y_JfM3q_jeMEojZP18rSuaCgYpaLxGTMr1KzQ&i=ejY5OG0wdDQ5WURramxtRHUVvu7RitsNddDG-KMPMXM&k=Ip2S&r=bnJ6VDc1TGFZSldEN080OWva484AqOkTvFekqP6D-OSBQrmfxEC1mHs-7Em9BAm5S90EYTTnp7VEHFBQEB4_Qw&s=99a6c22ef3cf974704323d65d59f3afc16142436b30fdba1e1bb6cd4e573af1a&u=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 14:06     ` Iztok Gregori
@ 2025-01-07 14:17       ` Gilberto Ferreira
  0 siblings, 0 replies; 9+ messages in thread
From: Gilberto Ferreira @ 2025-01-07 14:17 UTC (permalink / raw)
  To: Proxmox VE user list

1. Because the value is automatically calculated is better to increase
the value "retransmits_before_loss_const" (or decrease the "token"
value) than to set a constant (200 milliseconds for example)?

Well... as I said in my scenario, I just add the line token with the value
200.
I'm afraid I lack the proper knowledge to go deeper.

2. If I alter this value and in the future a node will be added/removed,
the value will remain the same or it will be recalculated?

Yes! The corosync.conf in /etc/pve will be sinc to any new server add to
the cluster, via pmxcfs.

Best regards
---


Em ter., 7 de jan. de 2025 às 11:06, Iztok Gregori <iztok.gregori@elettra.eu>
escreveu:

> On 07/01/25 13:33, Gilberto Ferreira wrote:
> > Just to clarify, I had a similar issue in a low latency network with 12
> > nodes cluster, all with 1G ethernet card.
> > After adding this token_retransmit to corosync.conf, no more problems.
> > Perhaps that could help you.
> >
> >
> >
> >
> >
> >
> > Em ter., 7 de jan. de 2025 às 09:01, Gilberto Ferreira <
> > gilberto.nunes32@gmail.com> escreveu:
> >
> >> Try to add this in corosync.conf in one of the nodes:  token_retransmit:
> >> 200
> >>
>
> Hi and thank you for your suggestion!
>
> Right now this is the configuration of the token in our cluster:
>
> > root@aaa:~# corosync-cmapctl | grep token
> > runtime.config.totem.token (u32) = 12750
> > runtime.config.totem.token_retransmit (u32) = 3035
> > runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
>
> The token_retransmit is calculated automatically by the formula
>
> token / (token_retransmits_before_loss_const + 0.2)
>
> and in our case is 3035 miliseconds. I have 2 questions:
>
> 1. Because the value is automatically calculated is better to increase
> the value "retransmits_before_loss_const" (or decrease the "token"
> value) than to set a constant (200 milliseconds for example)?
> 2. If I alter this value and in the future a node will be added/removed,
> the value will remain the same or it will be recalculated?
>
>
> Thanks
> Iztok
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-07 14:15     ` DERUMIER, Alexandre
@ 2025-01-08 10:12       ` Iztok Gregori
  2025-01-08 12:02         ` Alwin Antreich via pve-user
  2025-01-08 12:53         ` proxmox
  0 siblings, 2 replies; 9+ messages in thread
From: Iztok Gregori @ 2025-01-08 10:12 UTC (permalink / raw)
  To: pve-user

Hi!

On 07/01/25 15:15, DERUMIER, Alexandre wrote:
> Personnaly, I'll recommand to disable HA  temporary during the network change  (mv /etc/pve/ha/resources.cfg  to a tmp directory,  stop all pve-ha-lrm   , tehn stop all pve-ha-crm   to stop  the watchdog)
> 
> Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.
> 

Good advice. But with the pve-ha-* services down the "HA-VMs" cannot 
migrate from a node to the other, because the migration is handled by 
the HA (or at least that is how I remember to happen some time ago). So 
I've (temporary) removed all the resources (VMs) from HA, which has the 
effect to tell "pve-ha-lrm" to disable the watchdog( "watchdog closed 
(disabled)" ) and no reboot should occur.

> It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)

I'm leaning to a similar conclusion, but I'm still lacking in 
understanding of how corosync/watchdog is handled in Proxmox.

For example I still don't know who is updating the watchdog-mux service? 
Is corosync (but no "watchdog_device" is set in corosync.conf and by 
manual "if unset, empty or "off", no watchdog is used.") or is pve-ha-lrm?

I think that, after the migration, my best shot is to upgrade the 
cluster, but I have to understand if newer libcephfs client libraries 
support old Ceph clusters.

> Also, for "big" clusters (20-30 nodes), I'm using sctp protocol now, instead udp. for me , it's a lot more reliable when you have a network saturation on 1 now.
> 
> (I had the case of interne  udp flood attack coming from outside on 1 on my node, lagging the whole corosync cluster).²
> 
> 
> corosync.conf
> 
> totem {
>     cluster_name: ....
>     ....
>    interface {
>        knet_transport: sctp
>        linknumber: 0
>    }
>    ....
> 
> 
> (This need a full restart of corosync everywhere, and HA need to be disable before, because udp can't communite with sctp, so you'll have a loss of quorum during the change)

I've read about it, I think I'll follow your suggestion. In those bigger 
cluster have you tinker with corosync values as "token" or 
"token_retransmits_before_loss_const"?

Thank you!

Iztok


_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-08 10:12       ` Iztok Gregori
@ 2025-01-08 12:02         ` Alwin Antreich via pve-user
  2025-01-08 12:53         ` proxmox
  1 sibling, 0 replies; 9+ messages in thread
From: Alwin Antreich via pve-user @ 2025-01-08 12:02 UTC (permalink / raw)
  To: iztok.gregori; +Cc: Alwin Antreich, Proxmox VE user list

[-- Attachment #1: Type: message/rfc822, Size: 6913 bytes --]

From: "Alwin Antreich" <alwin@antreich.com>
To: iztok.gregori@elettra.eu
Cc: "Proxmox VE user list" <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Corosync and Cluster reboot
Date: Wed, 08 Jan 2025 12:02:14 +0000
Message-ID: <eecf1a85ed7de3658e28e654ea63759fdc08292a@antreich.com>

Hi Iztok,


January 8, 2025 at 11:12 AM, "Iztok Gregori" <iztok.gregori@elettra.eu mailto:iztok.gregori@elettra.eu?to=%22Iztok%20Gregori%22%20%3Ciztok.gregori%40elettra.eu%3E > wrote:


> 
> Hi!
> 
> On 07/01/25 15:15, DERUMIER, Alexandre wrote:
> 
> > 
> > Personnaly, I'll recommand to disable HA temporary during the network change (mv /etc/pve/ha/resources.cfg to a tmp directory, stop all pve-ha-lrm , tehn stop all pve-ha-crm to stop the watchdog)
> >  
> >  Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.
> > 
> Good advice. But with the pve-ha-* services down the "HA-VMs" cannot 
> migrate from a node to the other, because the migration is handled by 
> the HA (or at least that is how I remember to happen some time ago). So 
> I've (temporary) removed all the resources (VMs) from HA, which has the 
> effect to tell "pve-ha-lrm" to disable the watchdog( "watchdog closed 
> (disabled)" ) and no reboot should occur.
Yes, after a minute or two when no resource is under HA the watchdog is closed (lrm becomes idle).
I second Alexandre's recommendation when working on the corosync network/config.

> 
> > 
> > It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)
> > 
> I'm leaning to a similar conclusion, but I'm still lacking in 
> understanding of how corosync/watchdog is handled in Proxmox.
> 
> For example I still don't know who is updating the watchdog-mux service? 
> Is corosync (but no "watchdog_device" is set in corosync.conf and by 
> manual "if unset, empty or "off", no watchdog is used.") or is pve-ha-lrm?
The watchdog-mux service is handled by the LRM service.
The LRM is holding a lock in /etc/pve when it becomes active. This allow the node to fence itself, since the watchdog isn't updated anymore when the node drops out of quorum. By default the softdog is used, but it can be changed to a hardware watchdog in /etc/default/pve-ha-manger.

> 
> I think that, after the migration, my best shot is to upgrade the 
> cluster, but I have to understand if newer libcephfs client libraries 
> support old Ceph clusters.
Ceph usually guarantees compatibility between two-ish major versions (eg. Quincy -> Squid, Pacific -> Reef; unless stated otherwise).
Any bigger version difference usually works as well, but it is strongly recommended to upgrade ceph as there have been numerous bugs fixed the past years.

Cheers,
Alwin
--
croit GmbH,
Consulting / Training / 24x7 Support
https://www.croit.io/services/proxmox


[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PVE-User] Corosync and Cluster reboot
  2025-01-08 10:12       ` Iztok Gregori
  2025-01-08 12:02         ` Alwin Antreich via pve-user
@ 2025-01-08 12:53         ` proxmox
  1 sibling, 0 replies; 9+ messages in thread
From: proxmox @ 2025-01-08 12:53 UTC (permalink / raw)
  To: Proxmox VE user list

Hello, 

Am 8. Januar 2025 11:12:02 MEZ schrieb Iztok Gregori <iztok.gregori@elettra.eu>:
>Hi!
>
>On 07/01/25 15:15, DERUMIER, Alexandre wrote:
>> Personnaly, I'll recommand to disable HA  temporary during the network change  (mv /etc/pve/ha/resources.cfg  to a tmp directory,  stop all pve-ha-lrm   , tehn stop all pve-ha-crm   to stop  the watchdog)
>> 
>> Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.
>> 
>
>Good advice. But with the pve-ha-* services down the "HA-VMs" cannot migrate from a node to the other, because the migration is handled by the HA (or at least that is how I remember to happen some time ago). So I've (temporary) removed all the resources (VMs) from HA, which has the effect to tell "pve-ha-lrm" to disable the watchdog( "watchdog closed (disabled)" ) and no reboot should occur.
>
>> It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)
>
>I'm leaning to a similar conclusion, but I'm still lacking in understanding of how corosync/watchdog is handled in Proxmox.
>
>For example I still don't know who is updating the watchdog-mux service? Is corosync (but no "watchdog_device" is set in corosync.conf and by manual "if unset, empty or "off", no watchdog is used.") or is pve-ha-lrm?

As far as i can say it's the pve-ha... service. 
If you dont use HA in your cluster, the watchdog isnt used.
Thats how i understand this. 

Hth

>
>I think that, after the migration, my best shot is to upgrade the cluster, but I have to understand if newer libcephfs client libraries support old Ceph clusters.
>
>> Also, for "big" clusters (20-30 nodes), I'm using sctp protocol now, instead udp. for me , it's a lot more reliable when you have a network saturation on 1 now.
>> 
>> (I had the case of interne  udp flood attack coming from outside on 1 on my node, lagging the whole corosync cluster).²
>> 
>> 
>> corosync.conf
>> 
>> totem {
>>     cluster_name: ....
>>     ....
>>    interface {
>>        knet_transport: sctp
>>        linknumber: 0
>>    }
>>    ....
>> 
>> 
>> (This need a full restart of corosync everywhere, and HA need to be disable before, because udp can't communite with sctp, so you'll have a loss of quorum during the change)
>
>I've read about it, I think I'll follow your suggestion. In those bigger cluster have you tinker with corosync values as "token" or "token_retransmits_before_loss_const"?
>
>Thank you!
>
>Iztok
>
>
>_______________________________________________
>pve-user mailing list
>pve-user@lists.proxmox.com
>https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-08 13:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-07 11:06 [PVE-User] Corosync and Cluster reboot Iztok Gregori
2025-01-07 12:01 ` Gilberto Ferreira
2025-01-07 12:33   ` Gilberto Ferreira
2025-01-07 14:06     ` Iztok Gregori
2025-01-07 14:17       ` Gilberto Ferreira
2025-01-07 14:15     ` DERUMIER, Alexandre
2025-01-08 10:12       ` Iztok Gregori
2025-01-08 12:02         ` Alwin Antreich via pve-user
2025-01-08 12:53         ` proxmox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal