public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox 1/7] resource-scheduling: clamp imbalance value to unit interval
Date: Wed, 29 Apr 2026 09:18:08 +0200	[thread overview]
Message-ID: <DI5GENTUC9C1.37Z3OK3D7QSE0@proxmox.com> (raw)
In-Reply-To: <DI4O2EE0OX1J.1B95XY2KTZNQ1@proxmox.com>

On Tue Apr 28, 2026 at 11:05 AM CEST, Daniel Kral wrote:
> On Mon Apr 27, 2026 at 3:20 PM CEST, Dominik Rusovac wrote:
>> The currently used load imbalance value is given as the so-called
>> coefficient of variation (CV), a value that may exceed 1. As such, the
>> CV value alone lacks meaning. A CV value of 0.0 means no imbalance, but
>> what does a value of, say, 1.7 mean?
>>
>> Relative to the number of nodes in a cluster, it is possible to
>> determine the upper bound of the CV value [0][1]. By dividing the CV
>> value by its upper bound, the load imbalance can be represented as a
>> value that varies between 0 and 1. Expressing the CV as a percentage
>> makes the concept of load imbalance easier to interpret.
>
> Nice, thanks for the work!
>
> Will test the changes over the week, but just from the better
> readability / interpretability of the imbalance value this should make
> it more user-friendly overall.

thx!

>
>>
>> [0] https://repositorio.ipbeja.pt/server/api/core/bitstreams/8ed9a444-dbe0-402f-9d2f-90c5bf6e418c/content
>> [1] https://stats.stackexchange.com/questions/18621/maximum-value-of-coefficient-of-variation-for-bounded-data-set
>
> and a good read overall, thanks!
>
> A note above Example 1 and the proposition 13 from the first paper [0]
> is interesting here:
>
>     All these properties refer to the case where a single sample is
>     considered, however, as noted by [16], Dodd’s corrected coefficient
>     of variation, CV corr is not suitable for comparative purpose, as
>     can be seen in the next example.
>
>     Example 1. [...]
>
>     [...]
>
>     Proposition 13. Dodd’s corrected coefficient of variation, CV corr,
>     is sample-size sensitive.
>
> AFAICT this should not be a problem for us to compare these values to
> make decisions since we only compare the values for the same cluster
> size configuration and therefore this shouldn't affect us badly.

I agree

>
>
> On another note, it should also be easier for users to set a imbalance
> threshold value, which is (at least roughly) invariant to the size of
> the cluster. If one or more nodes failed or are in maintenance node,
> this dramatically changes the decisions that the load balancer can make
> anyway, but I wonder whether how much difference it makes to the
> sensitivity to trigger the load balancing system.
>
>>
>> Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
>> ---
>>  proxmox-resource-scheduling/src/scheduler.rs | 33 +++++++++++++-------
>>  1 file changed, 21 insertions(+), 12 deletions(-)
>>

[snip]

>>  fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
>> -    let node_count = nodes.len();
>> +    let node_count = nodes.len() as f64;
>
> even though this reduces the amount of 'as f64's below, the node count
> is by its nature a positive integer so it should stay that way.
>

ack

>> +
>> +    // imbalance is perfect for less than 2 nodes
>> +    if node_count < 2.0 {
>> +        return 0.0;
>> +    }
>
> this could replace the check `load_sum == 0.0` below, which also makes
> sure that we don't ever divide by zero (and return a NaN as a result),
> and move the assignments for node_loads and load_sum into the else
> branch's code path.
>
> A comment could make it more explicit that this avoids dividing by zero.
>

true, will use early-return pattern, remove the else-branch and add a comment in v2

>> +
>>      let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
>>  
>>      let load_sum = node_loads.iter().sum::<f64>();
>> @@ -36,14 +42,17 @@ fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) ->
>>      if load_sum == 0.0 {
>>          0.0
>>      } else {
>> -        let load_mean = load_sum / node_count as f64;
>> +        let load_mean = load_sum / node_count;
>>  
>>          let squared_diff_sum = node_loads
>>              .iter()
>>              .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
>> -        let load_sd = (squared_diff_sum / node_count as f64).sqrt();
>> +        let load_sd = (squared_diff_sum / node_count).sqrt();
>> +
>> +        let max_cv = (node_count - 1.0).sqrt();
>> +        let cv = load_sd / load_mean;
>
> nit: just for aesthetics, could be reordered to cv and then max_cv

I agree

>
> also to not loose the reference from the patch message with future
> changes, it would be nice to add a comment here why the calculation for
> max_cv is correct.
>

ack, will add some ref in v2

[snip]




  reply	other threads:[~2026-04-29  7:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 13:20 [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox 0/7] clamp load imbalance to unit interval Dominik Rusovac
2026-04-27 13:20 ` [PATCH proxmox 1/7] resource-scheduling: clamp imbalance value " Dominik Rusovac
2026-04-28  9:05   ` Daniel Kral
2026-04-29  7:18     ` Dominik Rusovac [this message]
2026-04-27 13:20 ` [PATCH proxmox 2/7] resource-scheduling: re-adjust hardcoded imbalance values Dominik Rusovac
2026-04-28  8:53   ` Daniel Kral
2026-04-29  7:18     ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-manager 3/7] ui: from/CRSOptions: add maximum for threshold Dominik Rusovac
2026-04-28  8:52   ` Daniel Kral
2026-04-27 13:20 ` [PATCH pve-ha-manager 4/7] test: re-adjust logged imbalance values Dominik Rusovac
2026-04-28  8:52   ` Daniel Kral
2026-04-29  8:26     ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-ha-manager 5/7] manager: add load imbalance to status Dominik Rusovac
2026-04-28  9:20   ` Daniel Kral
2026-04-27 13:20 ` [PATCH pve-ha-manager 6/7] api: status: " Dominik Rusovac
2026-04-28  9:10   ` Daniel Kral
2026-04-29  8:49     ` Dominik Rusovac
2026-04-27 13:20 ` [PATCH pve-cluster 7/7] datacenter config: add maxima for load scheduler options Dominik Rusovac
2026-04-28  8:53   ` Daniel Kral
2026-04-29  7:48     ` Dominik Rusovac
2026-04-28  9:21 ` [RFC PATCH-SERIES cluster/ha-manager/manager/proxmox 0/7] clamp load imbalance to unit interval Daniel Kral
2026-04-29 12:23 ` Dominik Rusovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DI5GENTUC9C1.37Z3OK3D7QSE0@proxmox.com \
    --to=d.rusovac@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal