From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id CACC71FF15C
	for <inbox@lore.proxmox.com>; Fri, 17 Oct 2025 14:33:07 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id B627E1FBC;
	Fri, 17 Oct 2025 14:33:27 +0200 (CEST)
Message-ID: <cd768499-bde3-4c39-bf04-cdec71ed9464@proxmox.com>
Date: Fri, 17 Oct 2025 14:32:53 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Daniel Kral <d.kral@proxmox.com>
References: <20250930142021.366529-1-d.kral@proxmox.com>
 <20250930142021.366529-12-d.kral@proxmox.com>
Content-Language: en-US
From: Fiona Ebner <f.ebner@proxmox.com>
In-Reply-To: <20250930142021.366529-12-d.kral@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1760704370276
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.021 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH ha-manager 8/9] manager: make online node
 usage computation granular
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

Am 30.09.25 um 4:20 PM schrieb Daniel Kral:
> The HA Manager builds $online_node_usage in every FSM iteration in
> manage(...) and at every HA resource state change in
> change_service_state(...). This becomes quite costly with a high HA
> resource count and a lot of state changes happening at once, e.g.
> starting up multiple nodes with rebalance_on_request_start set or a
> failover of a node with many configured HA resources.
> 
> To improve this situation, make the changes to the $online_node_usage
> more granular by building $online_node_usage only once per call to
> manage(...) and changing the nodes a HA resource uses individually on
> every HA resource state transition.
> 
> The change in service usage "freshness" should be negligible here as the
> static service usage data is cached anyway (except if the cache fails
> for some reason).

But the cache is refreshed on every recompute_online_node_usage(), which
happened much more frequently before, so the fact that it's cached
doesn't seem like a strong argument here?

I /do/ think there is a real tradeoff being made, namely "the ability to
manage much larger fleets of guests" versus "immediately incorporating
every guest config change in decisions". Config changes that would lead
to wildly different decisions would need to be timed very badly to cause
actual issues and should be rare to begin with. Also, with PSI-based
information, things are also less "instant", I don't see an issue with
moving in the same direction.

> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>

> ---
> The add_service_usage(...) helper is added in anticipation for the next
> patch, we don't need a helper if we don't go for #9.

I think it's nice to have regardless. Inlining the function would just
bloat change_service_state() or what would be the alternative?

> @@ -314,7 +329,8 @@ my $change_service_state = sub {
>          $sd->{$k} = $v;
>      }
>  
> -    $self->recompute_online_node_usage();
> +    $self->{online_node_usage}->remove_service_usage($sid);
> +    $self->add_service_usage($sid, $sd);

Nice!

>  
>      $sd->{uid} = compute_new_uuid($new_state);
>  


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel