all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Stefan Radman <stefan.radman@me.com>,
	PVE User List <pve-user@pve.proxmox.com>
Subject: Re: [PVE-User] watchdog timeout hardcoded to 10 sec
Date: Fri, 10 Dec 2021 16:34:32 +0100	[thread overview]
Message-ID: <f4d0a324-4d6e-1f9c-17b0-46912f6454dc@proxmox.com> (raw)
In-Reply-To: <9005E17C-0114-4AD3-9CED-D3615E853F7B@me.com>

Hi,

On 10.12.21 15:22, Stefan Radman wrote:
> What is the reason for hardcoding the watchdog timeout into pve-ha-manager/watchdog-mux.c?

Note that this is the multiplexer, the actual timeout for its clients is 60s.

The MUX opens the actual watchdog, it's a really small C program with a very small
footprint and static resource usage, so it won't ever fail to update the watchdog
in any situation where the system isn't total lost.

The MUX then checks the actual clients, if those did not ping in the last 60s the
MUX will stop updating the actual watchdog, causing a reset around 0s to 10s later.

So the in-practice timeout for the watchdog services the MUX provides is 60 to 70
seconds, not ten.

> 
> https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33 <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33>
>   33 <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l33> int watchdog_timeout = 10;
> https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157 <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157>
>  157 <https://git.proxmox.com/?p=pve-ha-manager.git;a=blob;f=src/watchdog-mux.c#l157>     if (ioctl(watchdog_fd, WDIOC_SETTIMEOUT, &watchdog_timeout) == -1) {
> 
> I am trying to use a more conservative 5 minute timeout for the IPMI watchdog but it gets changed to 10 seconds when the watchdog-mux.service starts.

That's not a reasonable timeout for Proxmox VE's HA self fencing as pmxcfs locks have
a timeout of 2 minutes, if you go above that all consistency guarantees from the self
fencing are void and a HA Service can be recovered while the original one still access
some of its resources, iow. there be dragons.

ps. Personally I'd only rely on a HW watchdog if I'm really sure it runs stable, most
of the time their firmware is just a mess and they have so many bugs that the softdog
of the kernel, which itself is a quite small and simple kernel module, works more
stable. YMMV, but I never saw a situation where the softdog didn't do its job but we
got some report of failing HW watchdogs - not /that/ many, but most users go for the
default setup so this may be biased.

hope that helps,
Thomas




           reply	other threads:[~2021-12-10 15:44 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <9005E17C-0114-4AD3-9CED-D3615E853F7B@me.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4d0a324-4d6e-1f9c-17b0-46912f6454dc@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=pve-user@pve.proxmox.com \
    --cc=stefan.radman@me.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal