From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox 1/7] resource-scheduling: clamp imbalance value to unit interval
Date: Wed, 29 Apr 2026 09:18:08 +0200 [thread overview]
Message-ID: <DI5GENTUC9C1.37Z3OK3D7QSE0@proxmox.com> (raw)
In-Reply-To: <DI4O2EE0OX1J.1B95XY2KTZNQ1@proxmox.com>
On Tue Apr 28, 2026 at 11:05 AM CEST, Daniel Kral wrote:
> On Mon Apr 27, 2026 at 3:20 PM CEST, Dominik Rusovac wrote:
>> The currently used load imbalance value is given as the so-called
>> coefficient of variation (CV), a value that may exceed 1. As such, the
>> CV value alone lacks meaning. A CV value of 0.0 means no imbalance, but
>> what does a value of, say, 1.7 mean?
>>
>> Relative to the number of nodes in a cluster, it is possible to
>> determine the upper bound of the CV value [0][1]. By dividing the CV
>> value by its upper bound, the load imbalance can be represented as a
>> value that varies between 0 and 1. Expressing the CV as a percentage
>> makes the concept of load imbalance easier to interpret.
>
> Nice, thanks for the work!
>
> Will test the changes over the week, but just from the better
> readability / interpretability of the imbalance value this should make
> it more user-friendly overall.
thx!
>
>>
>> [0] https://repositorio.ipbeja.pt/server/api/core/bitstreams/8ed9a444-dbe0-402f-9d2f-90c5bf6e418c/content
>> [1] https://stats.stackexchange.com/questions/18621/maximum-value-of-coefficient-of-variation-for-bounded-data-set
>
> and a good read overall, thanks!
>
> A note above Example 1 and the proposition 13 from the first paper [0]
> is interesting here:
>
> All these properties refer to the case where a single sample is
> considered, however, as noted by [16], Dodd’s corrected coefficient
> of variation, CV corr is not suitable for comparative purpose, as
> can be seen in the next example.
>
> Example 1. [...]
>
> [...]
>
> Proposition 13. Dodd’s corrected coefficient of variation, CV corr,
> is sample-size sensitive.
>
> AFAICT this should not be a problem for us to compare these values to
> make decisions since we only compare the values for the same cluster
> size configuration and therefore this shouldn't affect us badly.
I agree
>
>
> On another note, it should also be easier for users to set a imbalance
> threshold value, which is (at least roughly) invariant to the size of
> the cluster. If one or more nodes failed or are in maintenance node,
> this dramatically changes the decisions that the load balancer can make
> anyway, but I wonder whether how much difference it makes to the
> sensitivity to trigger the load balancing system.
>
>>
>> Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
>> ---
>> proxmox-resource-scheduling/src/scheduler.rs | 33 +++++++++++++-------
>> 1 file changed, 21 insertions(+), 12 deletions(-)
>>
[snip]
>> fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
>> - let node_count = nodes.len();
>> + let node_count = nodes.len() as f64;
>
> even though this reduces the amount of 'as f64's below, the node count
> is by its nature a positive integer so it should stay that way.
>
ack
>> +
>> + // imbalance is perfect for less than 2 nodes
>> + if node_count < 2.0 {
>> + return 0.0;
>> + }
>
> this could replace the check `load_sum == 0.0` below, which also makes
> sure that we don't ever divide by zero (and return a NaN as a result),
> and move the assignments for node_loads and load_sum into the else
> branch's code path.
>
> A comment could make it more explicit that this avoids dividing by zero.
>
true, will use early-return pattern, remove the else-branch and add a comment in v2
>> +
>> let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
>>
>> let load_sum = node_loads.iter().sum::<f64>();
>> @@ -36,14 +42,17 @@ fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) ->
>> if load_sum == 0.0 {
>> 0.0
>> } else {
>> - let load_mean = load_sum / node_count as f64;
>> + let load_mean = load_sum / node_count;
>>
>> let squared_diff_sum = node_loads
>> .iter()
>> .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
>> - let load_sd = (squared_diff_sum / node_count as f64).sqrt();
>> + let load_sd = (squared_diff_sum / node_count).sqrt();
>> +
>> + let max_cv = (node_count - 1.0).sqrt();
>> + let cv = load_sd / load_mean;
>
> nit: just for aesthetics, could be reordered to cv and then max_cv
I agree
>
> also to not loose the reference from the patch message with future
> changes, it would be nice to add a comment here why the calculation for
> max_cv is correct.
>
ack, will add some ref in v2
[snip]
next prev parent reply other threads:[~2026-04-29 7:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 13:20 [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox 0/7] clamp load imbalance to unit interval Dominik Rusovac
2026-04-27 13:20 ` [PATCH proxmox 1/7] resource-scheduling: clamp imbalance value " Dominik Rusovac
2026-04-28 9:05 ` Daniel Kral
2026-04-29 7:18 ` Dominik Rusovac [this message]
2026-04-27 13:20 ` [PATCH proxmox 2/7] resource-scheduling: re-adjust hardcoded imbalance values Dominik Rusovac
2026-04-28 8:53 ` Daniel Kral
2026-04-29 7:18 ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-manager 3/7] ui: from/CRSOptions: add maximum for threshold Dominik Rusovac
2026-04-28 8:52 ` Daniel Kral
2026-04-27 13:20 ` [PATCH pve-ha-manager 4/7] test: re-adjust logged imbalance values Dominik Rusovac
2026-04-28 8:52 ` Daniel Kral
2026-04-29 8:26 ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-ha-manager 5/7] manager: add load imbalance to status Dominik Rusovac
2026-04-28 9:20 ` Daniel Kral
2026-04-27 13:20 ` [PATCH pve-ha-manager 6/7] api: status: " Dominik Rusovac
2026-04-28 9:10 ` Daniel Kral
2026-04-29 8:49 ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-cluster 7/7] datacenter config: add maxima for load scheduler options Dominik Rusovac
2026-04-28 8:53 ` Daniel Kral
2026-04-29 7:48 ` Dominik Rusovac
2026-04-28 9:21 ` [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox 0/7] clamp load imbalance to unit interval Daniel Kral
2026-04-29 12:23 ` Dominik Rusovac
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DI5GENTUC9C1.37Z3OK3D7QSE0@proxmox.com \
--to=d.rusovac@proxmox.com \
--cc=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.