public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Alwin Antreich via pve-user <pve-user@lists.proxmox.com>
To: iztok.gregori@elettra.eu
Cc: Alwin Antreich <alwin@antreich.com>,
	Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Corosync and Cluster reboot
Date: Wed, 08 Jan 2025 12:02:14 +0000	[thread overview]
Message-ID: <mailman.151.1736341029.441.pve-user@lists.proxmox.com> (raw)
In-Reply-To: <1be81920-ed5b-4b96-938a-4f35551b9ce5@elettra.eu>

[-- Attachment #1: Type: message/rfc822, Size: 6913 bytes --]

From: "Alwin Antreich" <alwin@antreich.com>
To: iztok.gregori@elettra.eu
Cc: "Proxmox VE user list" <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Corosync and Cluster reboot
Date: Wed, 08 Jan 2025 12:02:14 +0000
Message-ID: <eecf1a85ed7de3658e28e654ea63759fdc08292a@antreich.com>

Hi Iztok,


January 8, 2025 at 11:12 AM, "Iztok Gregori" <iztok.gregori@elettra.eu mailto:iztok.gregori@elettra.eu?to=%22Iztok%20Gregori%22%20%3Ciztok.gregori%40elettra.eu%3E > wrote:


> 
> Hi!
> 
> On 07/01/25 15:15, DERUMIER, Alexandre wrote:
> 
> > 
> > Personnaly, I'll recommand to disable HA temporary during the network change (mv /etc/pve/ha/resources.cfg to a tmp directory, stop all pve-ha-lrm , tehn stop all pve-ha-crm to stop the watchdog)
> >  
> >  Then, after the migration, check the corosync logs during 1 or 2 days , and after that , if no retransmit occur, reenable HA.
> > 
> Good advice. But with the pve-ha-* services down the "HA-VMs" cannot 
> migrate from a node to the other, because the migration is handled by 
> the HA (or at least that is how I remember to happen some time ago). So 
> I've (temporary) removed all the resources (VMs) from HA, which has the 
> effect to tell "pve-ha-lrm" to disable the watchdog( "watchdog closed 
> (disabled)" ) and no reboot should occur.
Yes, after a minute or two when no resource is under HA the watchdog is closed (lrm becomes idle).
I second Alexandre's recommendation when working on the corosync network/config.

> 
> > 
> > It's really possible that it's a corosync bug (I remember to have had this kind of error with pve 7.X)
> > 
> I'm leaning to a similar conclusion, but I'm still lacking in 
> understanding of how corosync/watchdog is handled in Proxmox.
> 
> For example I still don't know who is updating the watchdog-mux service? 
> Is corosync (but no "watchdog_device" is set in corosync.conf and by 
> manual "if unset, empty or "off", no watchdog is used.") or is pve-ha-lrm?
The watchdog-mux service is handled by the LRM service.
The LRM is holding a lock in /etc/pve when it becomes active. This allow the node to fence itself, since the watchdog isn't updated anymore when the node drops out of quorum. By default the softdog is used, but it can be changed to a hardware watchdog in /etc/default/pve-ha-manger.

> 
> I think that, after the migration, my best shot is to upgrade the 
> cluster, but I have to understand if newer libcephfs client libraries 
> support old Ceph clusters.
Ceph usually guarantees compatibility between two-ish major versions (eg. Quincy -> Squid, Pacific -> Reef; unless stated otherwise).
Any bigger version difference usually works as well, but it is strongly recommended to upgrade ceph as there have been numerous bugs fixed the past years.

Cheers,
Alwin
--
croit GmbH,
Consulting / Training / 24x7 Support
https://www.croit.io/services/proxmox


[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

  reply	other threads:[~2025-01-08 12:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 11:06 Iztok Gregori
2025-01-07 12:01 ` Gilberto Ferreira
2025-01-07 12:33   ` Gilberto Ferreira
2025-01-07 14:06     ` Iztok Gregori
2025-01-07 14:17       ` Gilberto Ferreira
2025-01-07 14:15     ` DERUMIER, Alexandre
2025-01-08 10:12       ` Iztok Gregori
2025-01-08 12:02         ` Alwin Antreich via pve-user [this message]
2025-01-08 12:53         ` proxmox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mailman.151.1736341029.441.pve-user@lists.proxmox.com \
    --to=pve-user@lists.proxmox.com \
    --cc=alwin@antreich.com \
    --cc=iztok.gregori@elettra.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal