From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id C3E741FF17A for ; Fri, 4 Jul 2025 15:39:00 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F355A38FBE; Fri, 4 Jul 2025 15:39:36 +0200 (CEST) From: Maximiliano Sandoval To: pve-devel@lists.proxmox.com Date: Fri, 4 Jul 2025 15:38:56 +0200 Message-Id: <20250704133902.398663-1-m.sandoval@proxmox.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.098 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH ha-manager v3 0/6] watchdog-mux: sync log to disk before and after expiring X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" Without a clear-cut message in the log, it is very hard to provide a definitive answer to whether a host fenced or not. In some cases the journal on the disk can be missing up to 2 minutes since its last logged entry and the time where another node detects the corosync link is down, with such a gap, the fenced node would not even record that it lost conenction and it is not possible to fully-determine if the node was fenced or not. This series: - adds a second warning 10 seconds before the watchdog expires - syncs the journal to disk after the warning was issued - syncs the journal to disk after the watchdog expires - allows for watchdog-mux to exit(EXIT_SUCCESS) before the fence (new in v3) Differences from v2: - Instead of explicitly adding a call to sync the journal after we disable updates, we help the process breaking out of the loop, allowing it to reach the code that would call the sync and then exit() Differences from v1: - Define the warning cuttoff based on the 60 second timeout - Change log messages and constant names - When not immediately fencing, run journal sync in double fork Maximiliano Sandoval (6): watchdog-mux: Use #define for 60s timeout watchdog-mux: split if block in two if blocks watchdog-mux: warn when about to expire watchdog-mux: sync journal right after fence warning watchdog-mux: break out of loop when updates are disabled watchdog-mux: Remove wrapping if guard src/watchdog-mux.c | 61 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 49 insertions(+), 12 deletions(-) -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel