all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Aaron Lauterer <a.lauterer@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Maximiliano Sandoval <m.sandoval@proxmox.com>
Subject: Re: [pve-devel] [PATCH ha-manager 2/3] watchdog: warn when about to expire
Date: Mon, 16 Jun 2025 10:37:23 +0200	[thread overview]
Message-ID: <e3e15289-f7b4-4f33-8d4b-5aef2862d5a6@proxmox.com> (raw)
In-Reply-To: <20250519130935.365142-3-m.sandoval@proxmox.com>



On  2025-05-19  15:09, Maximiliano Sandoval wrote:
> Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
> ---
>   src/watchdog-mux.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
> diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
> index a9017b3..e14c768 100644
> --- a/src/watchdog-mux.c
> +++ b/src/watchdog-mux.c
> @@ -29,15 +29,24 @@
>   
>   #define JOURNALCTL_BIN "/bin/journalctl"
>   
> +#define CLIENT_WATCHDOG_TIMEOUT_WARNING 50

some comment why 50 is used would be useful. Or alternatively it maybe 
could be defined differently a bit further below, using the 
client_watchdog_timeout as reference:

client_watchdog_timeout_warning = client_watchdog_timeout - 10;

This way would be clearer then that we want to react to the last 10 
seconds before we have the timeout. But that is just some idea.

> +
>   int watchdog_fd = -1;
>   int watchdog_timeout = 10;
>   int client_watchdog_timeout = 60;
>   int update_watchdog = 1;
>   
> +enum warning_state_t {
> +   NONE,
> +   WARNING_ISSUED,
> +   CRISIS_AVERTED,

I don't like the "CRISIS" vocabulary here. Why not call it 
"FENCE_AVERTED" or "HOST_FENCE_AVERTED" ?
It sounds less sensational and refers to what is averted

> +};
> +
>   typedef struct {
>       int fd;
>       time_t time;
>       int magic_close;
> +    enum warning_state_t warning_state;
>   } wd_client_t;
>   
>   #define MAX_CLIENTS 100
> @@ -54,6 +63,7 @@ alloc_client(int fd, time_t time)
>               client_list[i].fd = fd;
>               client_list[i].time = time;
>               client_list[i].magic_close = 0;
> +            client_list[i].warning_state = NONE;
>               return &client_list[i];
>           }
>       }
> @@ -244,6 +254,22 @@ main(void)
>                   time_t ctime = time(NULL);
>                   for (i = 0; i < MAX_CLIENTS; i++) {
>                       if (client_list[i].fd != 0 && client_list[i].time != 0) {
> +                        if (
> +                            client_list[i].warning_state == WARNING_ISSUED
> +                            && (ctime - client_list[i].time) <= CLIENT_WATCHDOG_TIMEOUT_WARNING
> +                        ) {
> +                            client_list[i].warning_state = CRISIS_AVERTED;
> +                            fprintf(stderr, "phew, client watchdog was updated before expiring\n");
> +                        }
> +
> +                        if (
> +                            client_list[i].warning_state != WARNING_ISSUED
> +                            && (ctime - client_list[i].time) > CLIENT_WATCHDOG_TIMEOUT_WARNING
> +                        ) {
> +                            client_list[i].warning_state = WARNING_ISSUED;
> +                            fprintf(stderr, "client watchdog is about to expire\n");
> +                        }
> +
>                           if ((ctime - client_list[i].time) > client_watchdog_timeout) {
>                               update_watchdog = 0;
>                               fprintf(stderr, "client watchdog expired - disable watchdog updates\n");



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-06-16  8:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19 13:09 [pve-devel] [PATCH ha-manager 0/3] watchdog: sync log to disk before and after expiring Maximiliano Sandoval
2025-05-19 13:09 ` [pve-devel] [PATCH ha-manager 1/3] watchdog: separate if in two parts Maximiliano Sandoval
2025-05-19 13:09 ` [pve-devel] [PATCH ha-manager 2/3] watchdog: warn when about to expire Maximiliano Sandoval
2025-06-16  8:37   ` Aaron Lauterer [this message]
2025-06-17  6:11   ` Thomas Lamprecht
2025-05-19 13:09 ` [pve-devel] [PATCH ha-manager 3/3] watchdog: sync journal after sending expiration related messages Maximiliano Sandoval
2025-06-17  6:21   ` Thomas Lamprecht
2025-07-04 12:32     ` Maximiliano Sandoval
2025-06-16  8:40 ` [pve-devel] [PATCH ha-manager 0/3] watchdog: sync log to disk before and after expiring Aaron Lauterer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3e15289-f7b4-4f33-8d4b-5aef2862d5a6@proxmox.com \
    --to=a.lauterer@proxmox.com \
    --cc=m.sandoval@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal