From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id A2DD81FF140 for ; Fri, 27 Mar 2026 02:18:08 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1BC791F8EF; Fri, 27 Mar 2026 02:18:29 +0100 (CET) Message-ID: <2efc6fc2-62ec-4822-829f-780d95938fe4@proxmox.com> Date: Fri, 27 Mar 2026 02:17:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH ha-manager v2 3/4] fix #2751: implement disarm-ha and arm-ha for safe cluster maintenance To: Fiona Ebner , pve-devel@lists.proxmox.com References: <20260321234350.2158438-1-t.lamprecht@proxmox.com> <20260321234350.2158438-4-t.lamprecht@proxmox.com> Content-Language: en-US From: Thomas Lamprecht In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774574224783 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.011 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: M2RTOSFQ4TKBNBV2PIIEISKGMYBGIYK3 X-Message-ID-Hash: M2RTOSFQ4TKBNBV2PIIEISKGMYBGIYK3 X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Am 25.03.26 um 16:49 schrieb Fiona Ebner: > Am 22.03.26 um 12:57 AM schrieb Thomas Lamprecht: >> + if ($mode eq 'freeze') { >> + for my $sid (sort keys %$ss) { >> + my $sd = $ss->{$sid}; >> + my $state = $sd->{state}; >> + next if $state eq 'freeze'; # already frozen >> + if ( >> + $state eq 'started' >> + || $state eq 'stopped' >> + || $state eq 'request_stop' >> + || $state eq 'request_start' >> + || $state eq 'request_start_balance' >> + || $state eq 'error' > Should it really happen for the 'error' state too? Because when > re-arming, the state will become 'started': > > Mar 25 16:20:06 pve9a1 pve-ha-crm[242553]: disarm: freezing service > 'vm:400' (was 'error') > ... > Mar 25 16:20:36 pve9a1 pve-ha-crm[242553]: service 'vm:400': state > changed from 'freeze' to 'started' > > Which feels rather surprising to me. For comparison, after a cold > cluster start, services in 'error' state are not (attempted to be) > started either. You're right, thanks for noticing this, fixed in: https://git.proxmox.com/?p=pve-ha-manager.git;a=commitdiff;h=890673383c27a2949315a306218ef035042b46b0;hp=b6b025a268032ff5302bede1f5eb56247af13f21 Albeit, I'd like to revisit the error state and its being a final status in the FSM, as somethings might be often better to just be retried with a rate-limit forever in the context of HA. Anyhow, definitively orthogonal to this series.