From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id B59291FF146 for ; Tue, 12 May 2026 13:26:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DC1DAFE5C; Tue, 12 May 2026 13:26:12 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 12 May 2026 13:25:34 +0200 Message-Id: From: "Daniel Kral" To: "Dominik Rusovac" , Subject: Re: [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox v3 0/6] clamp load imbalance to unit interval X-Mailer: aerc 0.21.0-136-gdb9fe9896a79-dirty References: <20260430114845.151174-1-d.rusovac@proxmox.com> In-Reply-To: <20260430114845.151174-1-d.rusovac@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778585022881 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.076 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: E6R3PCGMIJD3WWG572TZ6VXEFG5K5LMF X-Message-ID-Hash: E6R3PCGMIJD3WWG572TZ6VXEFG5K5LMF X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu Apr 30, 2026 at 1:48 PM CEST, Dominik Rusovac wrote: > # TL;DR > clamp load imbalance to value between 0 and 1, and display the value as > percentage in HA Status panel of PVE UI. > > NOTE: This needs a version bump for proxmox-perl-rs, because the changes = in > proxmox-resource-scheduling need to be propagated through those bindings. > > # Details > The currently used load imbalance value is given as the so-called coeffic= ient of > variation (CV), a value that may exceed 1. As such, the CV value alone la= cks > meaning. A CV value of 0.0 means no imbalance, but what does a value of, = say, > 1.7 mean? > > Relative to the number of nodes in a cluster, it is possible to determine= the > upper bound of the CV value [0][1]. By dividing the CV value by its upper > bound, the load imbalance can be represented as a value that varies betwe= en 0 > and 1. Expressing the CV as a percentage makes the concept of load imbala= nce > easier to interpret. > > # Summary of Changes > This series: > - represents load imbalance as a value between 0 and 1; > - adds a maximum value of 1.0 for load scheduler options; and > - integrates the load imbalance value within the HA status endpoint; > this is to provide feedback on the prevailing load imbalance in the PVE= UI. > > # Refs > [0] https://repositorio.ipbeja.pt/server/api/core/bitstreams/8ed9a444-dbe= 0-402f-9d2f-90c5bf6e418c/content > [1] https://stats.stackexchange.com/questions/18621/maximum-value-of-coef= ficient-of-variation-for-bounded-data-set I'm running this patch series on my 3-node test cluster. I tested this by inducing a few imbalance spikes for a couple of rounds with some build jobs running and letting the cluster stay mostly idle for a while. Works good for me so far, which is to be expected since this should only be a mapping to the unit interval. It's also very helpful to be able to look at the most recent load imbalance, thanks for the series! Besides the two small nits, I wonder if we should change the default imbalance threshold value of 0.3 as with these patches applied the behavior of the load balancer changes and makes the trigger more insensitive by default. For example, for the smaller cluster sizes: >>> [(0.3/math.sqrt(n-1), math.sqrt(n-1)) for n in range(2,10)] [ (2, 0.3, 1.0), (3, 0.21213203435596423, 1.4142135623730951), (4, 0.17320508075688773, 1.7320508075688772), (5, 0.15, 2.0), (6, 0.13416407864998736, 2.23606797749979), (7, 0.12247448713915891, 2.449489742783178), (8, 0.11338934190276816, 2.6457513110645907), (9, 0.10606601717798211, 2.8284271247461903) ] Though this could be done as a follow-up to this series. Otherwise, consider the patches (with the nits resolved) as: Reviewed-by: Daniel Kral Tested-by: Daniel Kral