From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 877771FF144
	for <inbox@lore.proxmox.com>; Tue, 10 Mar 2026 11:40:36 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id A05CB181DB;
	Tue, 10 Mar 2026 11:40:28 +0100 (CET)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Tue, 10 Mar 2026 11:40:21 +0100
Message-Id: <DGZ1E90K0WCS.2Y9ZYYI3TD7R3@proxmox.com>
Subject: Re: [RFC proxmox 4/5] resource-scheduling: implement rebalancing
 migration selection
From: "Daniel Kral" <d.kral@proxmox.com>
To: "Dominik Rusovac" <d.rusovac@proxmox.com>, <pve-devel@lists.proxmox.com>
X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty
References: <20260217141437.584852-1-d.kral@proxmox.com>
 <20260217141437.584852-5-d.kral@proxmox.com>
 <DGYAFS4DDSPR.1NB1BSYR0HIAQ@proxmox.com>
In-Reply-To: <DGYAFS4DDSPR.1NB1BSYR0HIAQ@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1773139188129
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.034 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: PSJXCSZ2ES2GK6QC665MUPH7BGDI42YC
X-Message-ID-Hash: PSJXCSZ2ES2GK6QC665MUPH7BGDI42YC
X-MailFrom: d.kral@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Thanks a lot for taking the time to review this! I've added some inline
notes, though most of them are just ACKs.

On Mon Mar 9, 2026 at 2:32 PM CET, Dominik Rusovac wrote:
> Nice work! Here are my thoughts and some things I noticed.
>
> On Tue Feb 17, 2026 at 3:13 PM CET, Daniel Kral wrote:
>> Assuming that a service will hold the same dynamic resource usage on a
>> new node as on the previous node, score possible migrations, where:
>>
>> - the cluster node imbalance is minimal (bruteforce), or
>>
>> - the shifted root mean square and maximum resource usages of the cpu
>>   and memory is minimal across the cluster nodes (TOPSIS).
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> score_best_balancing_migrations() and select_best_balancing_migration()
>> are separate because there could be future improvements for the single
>> select, but might be unnecessary and redundant (especially since we need
>> to expose it at perlmod and PVE::HA::Usage::{Dynamic,Static} twice).
>
> Regarding future improvements for the "single select", providing an
> `imbalance_threshold` to use for early termination may pay off
> performance-wise, e.g., something along the lines of:

The main idea to limit it in size is because of the
serialization/deserialization to Perl values, but writing it out makes
it clear that this should probably be the responsibility of the caller
(i.e., the perlmod bindings).

I might consider dropping the select_best_balancing_migration(...) here
entirely as we can still add it later on and use
score_best_balancing_migrations(..., limit=3D1) anywhere else now...

>
>     pub fn select_best_balancing_migration<I>(
>         &self,
>         candidates: I,
>         imbalance_threshold: Option<f64>
>     ) -> Option<ScoredMigration>
>     where
>         I: IntoIterator<Item =3D MigrationCandidate>,
>     {
>         let threshold =3D imbalance_threshold.unwrap_or(0.0);
>
>         let mut scored_migrations =3D BinaryHeap::new();
>         for candidate in candidates {
>                 let imbalance =3D self.node_imbalance_with_migration(&can=
didate);
>
>                 let migration =3D ScoredMigration {
>                     migration: candidate.into(),
>                     imbalance,
>                 };
>
>                 if imbalance <=3D threshold {
>                     return Some(migration);
>                 }

Hm, even though this could be nice, I'm not so sure about it as setting
a threshold externally might not be that easy and this will not consider
any later possibly better migration candidates?

>
>                 scored_migrations.push(migration);
>             }
>
>         scored_migrations.pop()
>     }
>>
>
> [snip]=20
>
>> +fn calculate_node_loads(nodes: &[NodeStats]) -> Vec<f64> {
>> +    nodes.iter().map(|stats| stats.load()).collect()
>> +}
>> +
>> +/// Returns the load imbalance among the nodes.
>> +///
>> +/// The load balance is measured as the statistical dispersion of the i=
ndividual node loads.
>> +///
>> +/// The current implementation uses the dimensionless coefficient of va=
riation, which expresses the
>> +/// standard deviation in relation to the average mean of the node load=
s. Additionally, the
>> +/// coefficient of variation is not robust, which is
>
> Parts of the docs are missing, it seems.

Right, will fix that in a v2!

>
>> +fn calculate_node_imbalance(nodes: &[NodeStats]) -> f64 {
>> +    let node_count =3D nodes.len();
>> +    let node_loads =3D calculate_node_loads(nodes);
>> +
>> +    let load_sum =3D node_loads
>> +        .iter()
>> +        .fold(0.0, |sum, node_load| sum + node_load);
>
> To increase readability, consider:
>    =20
>     let load_sum =3D node_loads.iter().sum::<f64>();

Will do!

I still want to document what Dominik pointed out to me off-list: Even
though that Iterator<Item =3D f64>::sum() will return -0.0 for iterators
that yield nothing, but the following `load_sum =3D=3D 0.0` test will still
hold true as -0.0 =3D=3D 0.0 in IEEE754 [0].

[0] https://en.wikipedia.org/wiki/IEEE_754#Comparison_predicates

>
>> +
>> +    // load_sum is guaranteed to be 0.0 for empty nodes
>> +    if load_sum =3D=3D 0.0 {
>> +        0.0
>> +    } else {
>> +        let load_mean =3D load_sum / node_count as f64;
>> +
>> +        let squared_diff_sum =3D node_loads
>> +            .iter()
>> +            .fold(0.0, |sum, node_load| sum + (node_load - load_mean).p=
owi(2));
>> +        let load_sd =3D (squared_diff_sum / node_count as f64).sqrt();
>> +
>> +        load_sd / load_mean
>> +    }
>>  }
>
> [0] Considering where this function is used, expecting a
> designated closure that is used to map `NodeUsage` onto `f64` seems=20
> to be more convenient, e.g.:
>
>     fn calculate_node_imbalance(
>         nodes: &[NodeUsage],
> 	to_load: impl FnMut(&NodeUsage) -> f64,
>     ) -> f64 {
>         let node_count =3D nodes.len();
> 	let node_loads =3D nodes.iter().map(to_load).collect::<Vec<_>>();
>
> 	// ...
>     }
>

Great idea! I'll change it to this in the v2!

>> =20
>
> [snip]
>
>> +    fn node_stats(&self) -> Vec<NodeStats> {
>> +        self.nodes.iter().map(|node| node.stats).collect()
>> +    }
>> +
>> +    /// Returns the individual node loads.
>> +    pub fn node_loads(&self) -> Vec<(String, f64)> {
>> +        self.nodes
>> +            .iter()
>> +            .map(|node| (node.name.to_string(), node.stats.load()))
>> +            .collect()
>> +    }
>> +
>> +    /// Returns the load imbalance among the nodes.
>> +    ///
>> +    /// See [`calculate_node_imbalance`] for more information.
>> +    pub fn node_imbalance(&self) -> f64 {
>> +        let node_stats =3D self.node_stats();
>> +
>> +        calculate_node_imbalance(&node_stats)
>> +    }
>
> Regarding [0], passing a closure compresses two iterations into one, e.g.=
:
>
>     pub fn node_imbalance(&self) -> f64 {
>         calculate_node_imbalance(&self.nodes, |node| node.stats.load())
>     }
>

ACK

>> +
>> +    /// Returns the load imbalance among the nodes as if a specific ser=
vice was moved.
>> +    ///
>> +    /// See [`calculate_node_imbalance`] for more information.
>> +    pub fn node_imbalance_with_migration(&self, migration: &MigrationCa=
ndidate) -> f64 {
>> +        let mut new_node_stats =3D Vec::with_capacity(self.nodes.len())=
;
>> +
>> +        self.nodes.iter().for_each(|node| {
>> +            let mut new_stats =3D node.stats;
>> +
>> +            if node.name =3D=3D migration.source_node {
>> +                new_stats.remove_running_service(&migration.stats);
>> +            } else if node.name =3D=3D migration.target_node {
>> +                new_stats.add_running_service(&migration.stats);
>> +            }
>> +
>> +            new_node_stats.push(new_stats);
>> +        });
>> +
>> +        calculate_node_imbalance(&new_node_stats)
>> +    }
>
> Same here regarding [0], e.g.:
>
>     pub fn node_imbalance_with_migration(&self, migration: &MigrationCand=
idate) -> f64 {
>         calculate_node_imbalance_some(&self.nodes, |node| {
>             let mut new_stats =3D node.stats;
>
>             if node.name =3D=3D migration.source_node {
>                 new_stats.remove_running_service(&migration.stats);
>             } else if node.name =3D=3D migration.target_node {
>                 new_stats.add_running_service(&migration.stats);
>             }
>
>             new_stats.load()
>         })
>     }
>

ACK, nice!

>> +
>> +    /// Score the service motions by the best node imbalance improvemen=
t with exhaustive search.
>> +    pub fn score_best_balancing_migrations<I>(
>> +        &self,
>> +        candidates: I,
>> +        limit: usize,
>> +    ) -> Result<Vec<ScoredMigration>, Error>
>
> Does this really have to return a `Result`? It uses no try operator and
> provides no error handling.

No it does not, seems like it was more out of habit than necessity. Will
remove the Result here in a v2!

>
>> +    where
>> +        I: IntoIterator<Item =3D MigrationCandidate>,
>> +    {
>> +        let mut scored_migrations =3D candidates
>> +            .into_iter()
>> +            .map(|candidate| {
>> +                let imbalance =3D self.node_imbalance_with_migration(&c=
andidate);
>> +
>> +                ScoredMigration {
>> +                    migration: candidate.into(),
>> +                    imbalance,
>> +                }
>> +            })
>> +            .collect::<BinaryHeap<_>>();
>> +
>> +        let mut best_alternatives =3D Vec::new();
>
> Since `limit` declares the capacity of `best_alternatives`, consider:
>
>     let mut best_alternatives =3D Vec::with_capacity(limit);

Thanks, ACK!

>
>> +
>> +        // BinaryHeap::into_iter_sorted() is still in nightly unfortuna=
tely
>> +        while best_alternatives.len() < limit {
>> +            match scored_migrations.pop() {
>> +                Some(alternative) =3D> best_alternatives.push(alternati=
ve),
>> +                None =3D> break,
>> +            }
>> +        }
>> +
>> +        Ok(best_alternatives)
>> +    }