From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id B5E041FF136
	for <inbox@lore.proxmox.com>; Mon, 09 Mar 2026 14:33:36 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 6D3C93473A;
	Mon,  9 Mar 2026 14:33:28 +0100 (CET)
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Mon, 09 Mar 2026 14:32:51 +0100
Message-Id: <DGYAFS4DDSPR.1NB1BSYR0HIAQ@proxmox.com>
Subject: Re: [RFC proxmox 4/5] resource-scheduling: implement rebalancing
 migration selection
From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
X-Mailer: aerc 0.20.0
References: <20260217141437.584852-1-d.kral@proxmox.com>
 <20260217141437.584852-5-d.kral@proxmox.com>
In-Reply-To: <20260217141437.584852-5-d.kral@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1773063139250
X-SPAM-LEVEL: Spam detection results:  0
	AWL                    -0.813 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	RCVD_IN_MSPIKE_H2       0.001 Average reputation (+2)
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0.408 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED  0.819 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED  0.903 ADMINISTRATOR NOTICE: The query to
 Validity was blocked.  See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: S2FKLLU6BVRMJG24AONLLBMRXPO4JNJV
X-Message-ID-Hash: S2FKLLU6BVRMJG24AONLLBMRXPO4JNJV
X-MailFrom: d.rusovac@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pve-devel-owner@lists.proxmox.com>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Subscribe: <mailto:pve-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pve-devel-leave@lists.proxmox.com>

Nice work! Here are my thoughts and some things I noticed.

On Tue Feb 17, 2026 at 3:13 PM CET, Daniel Kral wrote:
> Assuming that a service will hold the same dynamic resource usage on a
> new node as on the previous node, score possible migrations, where:
>
> - the cluster node imbalance is minimal (bruteforce), or
>
> - the shifted root mean square and maximum resource usages of the cpu
>   and memory is minimal across the cluster nodes (TOPSIS).
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> score_best_balancing_migrations() and select_best_balancing_migration()
> are separate because there could be future improvements for the single
> select, but might be unnecessary and redundant (especially since we need
> to expose it at perlmod and PVE::HA::Usage::{Dynamic,Static} twice).

Regarding future improvements for the "single select", providing an
`imbalance_threshold` to use for early termination may pay off
performance-wise, e.g., something along the lines of:

    pub fn select_best_balancing_migration<I>(
        &self,
        candidates: I,
        imbalance_threshold: Option<f64>
    ) -> Option<ScoredMigration>
    where
        I: IntoIterator<Item =3D MigrationCandidate>,
    {
        let threshold =3D imbalance_threshold.unwrap_or(0.0);

        let mut scored_migrations =3D BinaryHeap::new();
        for candidate in candidates {
                let imbalance =3D self.node_imbalance_with_migration(&candi=
date);

                let migration =3D ScoredMigration {
                    migration: candidate.into(),
                    imbalance,
                };

                if imbalance <=3D threshold {
                    return Some(migration);
                }

                scored_migrations.push(migration);
            }

        scored_migrations.pop()
    }
>

[snip]=20

> +fn calculate_node_loads(nodes: &[NodeStats]) -> Vec<f64> {
> +    nodes.iter().map(|stats| stats.load()).collect()
> +}
> +
> +/// Returns the load imbalance among the nodes.
> +///
> +/// The load balance is measured as the statistical dispersion of the in=
dividual node loads.
> +///
> +/// The current implementation uses the dimensionless coefficient of var=
iation, which expresses the
> +/// standard deviation in relation to the average mean of the node loads=
. Additionally, the
> +/// coefficient of variation is not robust, which is

Parts of the docs are missing, it seems.

> +fn calculate_node_imbalance(nodes: &[NodeStats]) -> f64 {
> +    let node_count =3D nodes.len();
> +    let node_loads =3D calculate_node_loads(nodes);
> +
> +    let load_sum =3D node_loads
> +        .iter()
> +        .fold(0.0, |sum, node_load| sum + node_load);

To increase readability, consider:
   =20
    let load_sum =3D node_loads.iter().sum::<f64>();

> +
> +    // load_sum is guaranteed to be 0.0 for empty nodes
> +    if load_sum =3D=3D 0.0 {
> +        0.0
> +    } else {
> +        let load_mean =3D load_sum / node_count as f64;
> +
> +        let squared_diff_sum =3D node_loads
> +            .iter()
> +            .fold(0.0, |sum, node_load| sum + (node_load - load_mean).po=
wi(2));
> +        let load_sd =3D (squared_diff_sum / node_count as f64).sqrt();
> +
> +        load_sd / load_mean
> +    }
>  }

[0] Considering where this function is used, expecting a
designated closure that is used to map `NodeUsage` onto `f64` seems=20
to be more convenient, e.g.:

    fn calculate_node_imbalance(
        nodes: &[NodeUsage],
	to_load: impl FnMut(&NodeUsage) -> f64,
    ) -> f64 {
        let node_count =3D nodes.len();
	let node_loads =3D nodes.iter().map(to_load).collect::<Vec<_>>();

	// ...
    }

> =20

[snip]

> +    fn node_stats(&self) -> Vec<NodeStats> {
> +        self.nodes.iter().map(|node| node.stats).collect()
> +    }
> +
> +    /// Returns the individual node loads.
> +    pub fn node_loads(&self) -> Vec<(String, f64)> {
> +        self.nodes
> +            .iter()
> +            .map(|node| (node.name.to_string(), node.stats.load()))
> +            .collect()
> +    }
> +
> +    /// Returns the load imbalance among the nodes.
> +    ///
> +    /// See [`calculate_node_imbalance`] for more information.
> +    pub fn node_imbalance(&self) -> f64 {
> +        let node_stats =3D self.node_stats();
> +
> +        calculate_node_imbalance(&node_stats)
> +    }

Regarding [0], passing a closure compresses two iterations into one, e.g.:

    pub fn node_imbalance(&self) -> f64 {
        calculate_node_imbalance(&self.nodes, |node| node.stats.load())
    }

> +
> +    /// Returns the load imbalance among the nodes as if a specific serv=
ice was moved.
> +    ///
> +    /// See [`calculate_node_imbalance`] for more information.
> +    pub fn node_imbalance_with_migration(&self, migration: &MigrationCan=
didate) -> f64 {
> +        let mut new_node_stats =3D Vec::with_capacity(self.nodes.len());
> +
> +        self.nodes.iter().for_each(|node| {
> +            let mut new_stats =3D node.stats;
> +
> +            if node.name =3D=3D migration.source_node {
> +                new_stats.remove_running_service(&migration.stats);
> +            } else if node.name =3D=3D migration.target_node {
> +                new_stats.add_running_service(&migration.stats);
> +            }
> +
> +            new_node_stats.push(new_stats);
> +        });
> +
> +        calculate_node_imbalance(&new_node_stats)
> +    }

Same here regarding [0], e.g.:

    pub fn node_imbalance_with_migration(&self, migration: &MigrationCandid=
ate) -> f64 {
        calculate_node_imbalance_some(&self.nodes, |node| {
            let mut new_stats =3D node.stats;

            if node.name =3D=3D migration.source_node {
                new_stats.remove_running_service(&migration.stats);
            } else if node.name =3D=3D migration.target_node {
                new_stats.add_running_service(&migration.stats);
            }

            new_stats.load()
        })
    }

> +
> +    /// Score the service motions by the best node imbalance improvement=
 with exhaustive search.
> +    pub fn score_best_balancing_migrations<I>(
> +        &self,
> +        candidates: I,
> +        limit: usize,
> +    ) -> Result<Vec<ScoredMigration>, Error>

Does this really have to return a `Result`? It uses no try operator and
provides no error handling.

> +    where
> +        I: IntoIterator<Item =3D MigrationCandidate>,
> +    {
> +        let mut scored_migrations =3D candidates
> +            .into_iter()
> +            .map(|candidate| {
> +                let imbalance =3D self.node_imbalance_with_migration(&ca=
ndidate);
> +
> +                ScoredMigration {
> +                    migration: candidate.into(),
> +                    imbalance,
> +                }
> +            })
> +            .collect::<BinaryHeap<_>>();
> +
> +        let mut best_alternatives =3D Vec::new();

Since `limit` declares the capacity of `best_alternatives`, consider:

    let mut best_alternatives =3D Vec::with_capacity(limit);

> +
> +        // BinaryHeap::into_iter_sorted() is still in nightly unfortunat=
ely
> +        while best_alternatives.len() < limit {
> +            match scored_migrations.pop() {
> +                Some(alternative) =3D> best_alternatives.push(alternativ=
e),
> +                None =3D> break,
> +            }
> +        }
> +
> +        Ok(best_alternatives)
> +    }