From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id C32151FF191 for ; Mon, 16 Jun 2025 10:40:25 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BAC593AF24; Mon, 16 Jun 2025 10:40:52 +0200 (CEST) Message-ID: Date: Mon, 16 Jun 2025 10:40:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Proxmox VE development discussion , Maximiliano Sandoval References: <20250519130935.365142-1-m.sandoval@proxmox.com> From: Aaron Lauterer In-Reply-To: <20250519130935.365142-1-m.sandoval@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.030 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH ha-manager 0/3] watchdog: sync log to disk before and after expiring X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" tested it by applying this series to a node with HA guests and then disabling the corosync network completely or, to test the "averted" log, sleeping for 45 seconds before bringing the corosync network back up. So far, it seems that the "about to expire" warning did make it into the journal in my tests. We will see in the future, how well that will work in production systems, depending on the underlying storage layer. Some smaller remarks on patch 2/3. Considers this series: Tested-By: Aaron Lauterer Reviewed-By: Aaron Lauterer On 2025-05-19 15:09, Maximiliano Sandoval wrote: > It is very hard to provide a definitive answer to whether a host fenced or not. > In some cases the journal on the disk can be missing up to 2 minutes since its > last logged entry and the time where another node detects the corosync link is > down, with such a gap, the fenced node would not even record that it lost > conenction and it is not possible to fully-determine if the node was fenced or > not. > > This series: > - adds a second warning 10 seconds before the watchdog expires > - syncs the journal to disk after the warning was issued > - syncs the journal to disk after the watchdog expires > > The variable names in the second commit could use some feedback. The way the > warning timeout is defined was arbitrary (10 seconds before the fence). > > Maximiliano Sandoval (3): > watchdog: separate if in two parts > watchdog: warn when about to expire > watchdog: sync journal after sending expiration related messages > > src/watchdog-mux.c | 40 +++++++++++++++++++++++++++++++++------- > 1 file changed, 33 insertions(+), 7 deletions(-) > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel