From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id CD56B1FF146 for ; Tue, 28 Apr 2026 11:21:44 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id AB93DC231; Tue, 28 Apr 2026 11:21:44 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 28 Apr 2026 11:21:40 +0200 Message-Id: From: "Daniel Kral" To: "Dominik Rusovac" , Subject: Re: [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox 0/7] clamp load imbalance to unit interval X-Mailer: aerc 0.21.0-136-gdb9fe9896a79-dirty References: <20260427132031.220468-1-d.rusovac@proxmox.com> In-Reply-To: <20260427132031.220468-1-d.rusovac@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1777368005026 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.078 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: WXB75H7AXECNWLBW5IUOKJ664P2HL3FS X-Message-ID-Hash: WXB75H7AXECNWLBW5IUOKJ664P2HL3FS X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon Apr 27, 2026 at 3:20 PM CEST, Dominik Rusovac wrote: > # TL;DR=20 > clamp load imbalance to value between 0 and 1, and display the value as > percentage in HA Status panel of PVE UI.=20 > > # Details > The currently used load imbalance value is given as the so-called coeffic= ient of > variation (CV), a value that may exceed 1. As such, the CV value alone la= cks > meaning. A CV value of 0.0 means no imbalance, but what does a value of, = say, > 1.7 mean? > > Relative to the number of nodes in a cluster, it is possible to determine= the > upper bound of the CV value [0][1]. By dividing the CV value by its upper > bound, the load imbalance can be represented as a value that varies betwe= en 0 > and 1. Expressing the CV as a percentage makes the concept of load imbala= nce > easier to interpret. > > # Summary of Changes > This series: > - represents load imbalance as a value between 0 and 1; > - adds a maximum value of 1.0 for load scheduler options; and > - integrates the load imbalance value within the HA status endpoint;=20 > this is to provide feedback on the prevailing load imbalance in the PVE= UI. As discussed off-list, it would be interesting to also keep a history of the imbalance value for the cluster. In that discussion we also wondered whether we could derive that history without changing the rrdcached schema at all by fetching the average/maximum values for the already pre-defined time frames (each minute, each hour, etc.) and use the same calculate_node_imbalance(), but just for the raw values. Haven't checked how much error this does introduce since the rrdcached values are different from the sampled values fetched from the rrddump in the HA Manager simply because these are averaged out, but it would be interesting if the introduced error is negible enough.