From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 86F1B1FF146 for ; Tue, 12 May 2026 14:06:21 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9B9F211219; Tue, 12 May 2026 14:06:19 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 12 May 2026 14:05:41 +0200 Message-Id: Subject: Re: [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox v3 0/6] clamp load imbalance to unit interval From: "Dominik Rusovac" To: "Daniel Kral" , X-Mailer: aerc 0.20.0 References: <20260430114845.151174-1-d.rusovac@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778587429730 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.352 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [stackexchange.com,ipbeja.pt] Message-ID-Hash: IZASPUB2S4ARCI66BNGKS644PZC4FATM X-Message-ID-Hash: IZASPUB2S4ARCI66BNGKS644PZC4FATM X-MailFrom: d.rusovac@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: thx for testing and reviewing this series=20 will resolve the nits=20 On Tue May 12, 2026 at 1:25 PM CEST, Daniel Kral wrote: > On Thu Apr 30, 2026 at 1:48 PM CEST, Dominik Rusovac wrote: >> # TL;DR >> clamp load imbalance to value between 0 and 1, and display the value as >> percentage in HA Status panel of PVE UI. >> >> NOTE: This needs a version bump for proxmox-perl-rs, because the changes= in >> proxmox-resource-scheduling need to be propagated through those bindings= . >> >> # Details >> The currently used load imbalance value is given as the so-called coeffi= cient of >> variation (CV), a value that may exceed 1. As such, the CV value alone l= acks >> meaning. A CV value of 0.0 means no imbalance, but what does a value of,= say, >> 1.7 mean? >> >> Relative to the number of nodes in a cluster, it is possible to determin= e the >> upper bound of the CV value [0][1]. By dividing the CV value by its uppe= r >> bound, the load imbalance can be represented as a value that varies betw= een 0 >> and 1. Expressing the CV as a percentage makes the concept of load imbal= ance >> easier to interpret. >> >> # Summary of Changes >> This series: >> - represents load imbalance as a value between 0 and 1; >> - adds a maximum value of 1.0 for load scheduler options; and >> - integrates the load imbalance value within the HA status endpoint; >> this is to provide feedback on the prevailing load imbalance in the PV= E UI. >> >> # Refs >> [0] https://repositorio.ipbeja.pt/server/api/core/bitstreams/8ed9a444-db= e0-402f-9d2f-90c5bf6e418c/content >> [1] https://stats.stackexchange.com/questions/18621/maximum-value-of-coe= fficient-of-variation-for-bounded-data-set > > I'm running this patch series on my 3-node test cluster. I tested this > by inducing a few imbalance spikes for a couple of rounds with some > build jobs running and letting the cluster stay mostly idle for a while. > > Works good for me so far, which is to be expected since this should only > be a mapping to the unit interval. It's also very helpful to be able to > look at the most recent load imbalance, thanks for the series! > > Besides the two small nits, I wonder if we should change the default > imbalance threshold value of 0.3 as with these patches applied the > behavior of the load balancer changes and makes the trigger more > insensitive by default. good point. we should give this a thought and adjust it in a follow-up > > For example, for the smaller cluster sizes: > > >>> [(0.3/math.sqrt(n-1), math.sqrt(n-1)) for n in range(2,10)] > [ > (2, 0.3, 1.0), > (3, 0.21213203435596423, 1.4142135623730951), > (4, 0.17320508075688773, 1.7320508075688772), > (5, 0.15, 2.0), > (6, 0.13416407864998736, 2.23606797749979), > (7, 0.12247448713915891, 2.449489742783178), > (8, 0.11338934190276816, 2.6457513110645907), > (9, 0.10606601717798211, 2.8284271247461903) > ] > > Though this could be done as a follow-up to this series. > > Otherwise, consider the patches (with the nits resolved) as: > > Reviewed-by: Daniel Kral > Tested-by: Daniel Kral