From: "Daniel Kral" <d.kral@proxmox.com>
To: "Dominik Rusovac" <d.rusovac@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [RFC proxmox 4/5] resource-scheduling: implement rebalancing migration selection
Date: Tue, 10 Mar 2026 11:40:21 +0100 [thread overview]
Message-ID: <DGZ1E90K0WCS.2Y9ZYYI3TD7R3@proxmox.com> (raw)
In-Reply-To: <DGYAFS4DDSPR.1NB1BSYR0HIAQ@proxmox.com>
Thanks a lot for taking the time to review this! I've added some inline
notes, though most of them are just ACKs.
On Mon Mar 9, 2026 at 2:32 PM CET, Dominik Rusovac wrote:
> Nice work! Here are my thoughts and some things I noticed.
>
> On Tue Feb 17, 2026 at 3:13 PM CET, Daniel Kral wrote:
>> Assuming that a service will hold the same dynamic resource usage on a
>> new node as on the previous node, score possible migrations, where:
>>
>> - the cluster node imbalance is minimal (bruteforce), or
>>
>> - the shifted root mean square and maximum resource usages of the cpu
>> and memory is minimal across the cluster nodes (TOPSIS).
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> score_best_balancing_migrations() and select_best_balancing_migration()
>> are separate because there could be future improvements for the single
>> select, but might be unnecessary and redundant (especially since we need
>> to expose it at perlmod and PVE::HA::Usage::{Dynamic,Static} twice).
>
> Regarding future improvements for the "single select", providing an
> `imbalance_threshold` to use for early termination may pay off
> performance-wise, e.g., something along the lines of:
The main idea to limit it in size is because of the
serialization/deserialization to Perl values, but writing it out makes
it clear that this should probably be the responsibility of the caller
(i.e., the perlmod bindings).
I might consider dropping the select_best_balancing_migration(...) here
entirely as we can still add it later on and use
score_best_balancing_migrations(..., limit=1) anywhere else now...
>
> pub fn select_best_balancing_migration<I>(
> &self,
> candidates: I,
> imbalance_threshold: Option<f64>
> ) -> Option<ScoredMigration>
> where
> I: IntoIterator<Item = MigrationCandidate>,
> {
> let threshold = imbalance_threshold.unwrap_or(0.0);
>
> let mut scored_migrations = BinaryHeap::new();
> for candidate in candidates {
> let imbalance = self.node_imbalance_with_migration(&candidate);
>
> let migration = ScoredMigration {
> migration: candidate.into(),
> imbalance,
> };
>
> if imbalance <= threshold {
> return Some(migration);
> }
Hm, even though this could be nice, I'm not so sure about it as setting
a threshold externally might not be that easy and this will not consider
any later possibly better migration candidates?
>
> scored_migrations.push(migration);
> }
>
> scored_migrations.pop()
> }
>>
>
> [snip]
>
>> +fn calculate_node_loads(nodes: &[NodeStats]) -> Vec<f64> {
>> + nodes.iter().map(|stats| stats.load()).collect()
>> +}
>> +
>> +/// Returns the load imbalance among the nodes.
>> +///
>> +/// The load balance is measured as the statistical dispersion of the individual node loads.
>> +///
>> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
>> +/// standard deviation in relation to the average mean of the node loads. Additionally, the
>> +/// coefficient of variation is not robust, which is
>
> Parts of the docs are missing, it seems.
Right, will fix that in a v2!
>
>> +fn calculate_node_imbalance(nodes: &[NodeStats]) -> f64 {
>> + let node_count = nodes.len();
>> + let node_loads = calculate_node_loads(nodes);
>> +
>> + let load_sum = node_loads
>> + .iter()
>> + .fold(0.0, |sum, node_load| sum + node_load);
>
> To increase readability, consider:
>
> let load_sum = node_loads.iter().sum::<f64>();
Will do!
I still want to document what Dominik pointed out to me off-list: Even
though that Iterator<Item = f64>::sum() will return -0.0 for iterators
that yield nothing, but the following `load_sum == 0.0` test will still
hold true as -0.0 == 0.0 in IEEE754 [0].
[0] https://en.wikipedia.org/wiki/IEEE_754#Comparison_predicates
>
>> +
>> + // load_sum is guaranteed to be 0.0 for empty nodes
>> + if load_sum == 0.0 {
>> + 0.0
>> + } else {
>> + let load_mean = load_sum / node_count as f64;
>> +
>> + let squared_diff_sum = node_loads
>> + .iter()
>> + .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
>> + let load_sd = (squared_diff_sum / node_count as f64).sqrt();
>> +
>> + load_sd / load_mean
>> + }
>> }
>
> [0] Considering where this function is used, expecting a
> designated closure that is used to map `NodeUsage` onto `f64` seems
> to be more convenient, e.g.:
>
> fn calculate_node_imbalance(
> nodes: &[NodeUsage],
> to_load: impl FnMut(&NodeUsage) -> f64,
> ) -> f64 {
> let node_count = nodes.len();
> let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
>
> // ...
> }
>
Great idea! I'll change it to this in the v2!
>>
>
> [snip]
>
>> + fn node_stats(&self) -> Vec<NodeStats> {
>> + self.nodes.iter().map(|node| node.stats).collect()
>> + }
>> +
>> + /// Returns the individual node loads.
>> + pub fn node_loads(&self) -> Vec<(String, f64)> {
>> + self.nodes
>> + .iter()
>> + .map(|node| (node.name.to_string(), node.stats.load()))
>> + .collect()
>> + }
>> +
>> + /// Returns the load imbalance among the nodes.
>> + ///
>> + /// See [`calculate_node_imbalance`] for more information.
>> + pub fn node_imbalance(&self) -> f64 {
>> + let node_stats = self.node_stats();
>> +
>> + calculate_node_imbalance(&node_stats)
>> + }
>
> Regarding [0], passing a closure compresses two iterations into one, e.g.:
>
> pub fn node_imbalance(&self) -> f64 {
> calculate_node_imbalance(&self.nodes, |node| node.stats.load())
> }
>
ACK
>> +
>> + /// Returns the load imbalance among the nodes as if a specific service was moved.
>> + ///
>> + /// See [`calculate_node_imbalance`] for more information.
>> + pub fn node_imbalance_with_migration(&self, migration: &MigrationCandidate) -> f64 {
>> + let mut new_node_stats = Vec::with_capacity(self.nodes.len());
>> +
>> + self.nodes.iter().for_each(|node| {
>> + let mut new_stats = node.stats;
>> +
>> + if node.name == migration.source_node {
>> + new_stats.remove_running_service(&migration.stats);
>> + } else if node.name == migration.target_node {
>> + new_stats.add_running_service(&migration.stats);
>> + }
>> +
>> + new_node_stats.push(new_stats);
>> + });
>> +
>> + calculate_node_imbalance(&new_node_stats)
>> + }
>
> Same here regarding [0], e.g.:
>
> pub fn node_imbalance_with_migration(&self, migration: &MigrationCandidate) -> f64 {
> calculate_node_imbalance_some(&self.nodes, |node| {
> let mut new_stats = node.stats;
>
> if node.name == migration.source_node {
> new_stats.remove_running_service(&migration.stats);
> } else if node.name == migration.target_node {
> new_stats.add_running_service(&migration.stats);
> }
>
> new_stats.load()
> })
> }
>
ACK, nice!
>> +
>> + /// Score the service motions by the best node imbalance improvement with exhaustive search.
>> + pub fn score_best_balancing_migrations<I>(
>> + &self,
>> + candidates: I,
>> + limit: usize,
>> + ) -> Result<Vec<ScoredMigration>, Error>
>
> Does this really have to return a `Result`? It uses no try operator and
> provides no error handling.
No it does not, seems like it was more out of habit than necessity. Will
remove the Result here in a v2!
>
>> + where
>> + I: IntoIterator<Item = MigrationCandidate>,
>> + {
>> + let mut scored_migrations = candidates
>> + .into_iter()
>> + .map(|candidate| {
>> + let imbalance = self.node_imbalance_with_migration(&candidate);
>> +
>> + ScoredMigration {
>> + migration: candidate.into(),
>> + imbalance,
>> + }
>> + })
>> + .collect::<BinaryHeap<_>>();
>> +
>> + let mut best_alternatives = Vec::new();
>
> Since `limit` declares the capacity of `best_alternatives`, consider:
>
> let mut best_alternatives = Vec::with_capacity(limit);
Thanks, ACK!
>
>> +
>> + // BinaryHeap::into_iter_sorted() is still in nightly unfortunately
>> + while best_alternatives.len() < limit {
>> + match scored_migrations.pop() {
>> + Some(alternative) => best_alternatives.push(alternative),
>> + None => break,
>> + }
>> + }
>> +
>> + Ok(best_alternatives)
>> + }
next prev parent reply other threads:[~2026-03-10 10:40 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 14:13 [RFC PATCH-SERIES many 00/36] dynamic scheduler + load rebalancer Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 1/5] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 2/5] resource-scheduling: introduce generic cluster usage implementation Daniel Kral
2026-03-09 13:38 ` Dominik Rusovac
2026-03-10 10:41 ` Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 3/5] resource-scheduling: add dynamic node and service stats Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 4/5] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-09 13:32 ` Dominik Rusovac
2026-03-10 10:40 ` Daniel Kral [this message]
2026-02-17 14:13 ` [RFC proxmox 5/5] resource-scheduling: implement Add and Default for {Dynamic,Static}ServiceStats Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 1/6] pve-rs: resource scheduling: use generic cluster usage implementation Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 2/6] pve-rs: resource scheduling: create service_nodes hashset from array Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 3/6] pve-rs: resource scheduling: store service stats independently of node Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 4/6] pve-rs: resource scheduling: expose auto rebalancing methods Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 5/6] pve-rs: resource scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 6/6] pve-rs: resource scheduling: implement pve_dynamic bindings Daniel Kral
2026-02-17 14:14 ` [RFC cluster 1/2] datacenter config: add dynamic load scheduler option Daniel Kral
2026-02-18 11:06 ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC cluster 2/2] datacenter config: add auto rebalancing options Daniel Kral
2026-02-18 11:15 ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC ha-manager 01/21] rename static node stats to be consistent with similar interfaces Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 02/21] resources: remove redundant load_config fallback for static config Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 03/21] remove redundant service_node and migration_target parameter Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 04/21] factor out common pve to ha resource type mapping Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 05/21] derive static service stats while filling the service stats repository Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 06/21] test: make static service usage explicit for all resources Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 07/21] make static service stats indexable by sid Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 08/21] move static service stats repository to PVE::HA::Usage::Static Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 09/21] usage: augment service stats with node and state information Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 10/21] include running non-HA resources in the scheduler's accounting Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 11/21] env, resources: add dynamic node and service stats abstraction Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 12/21] env: pve2: implement dynamic node and service stats Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 13/21] sim: hardware: pass correct types for static stats Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 14/21] sim: hardware: factor out static stats' default values Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 15/21] sim: hardware: rewrite set-static-stats Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 16/21] sim: hardware: add set-dynamic-stats for services Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 17/21] usage: add dynamic usage scheduler Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 18/21] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 19/21] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 20/21] implement automatic rebalancing Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 21/21] test: add basic automatic rebalancing system test cases Daniel Kral
2026-02-17 14:14 ` [RFC manager 1/2] ui: dc/options: add dynamic load scheduler option Daniel Kral
2026-02-18 11:10 ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC manager 2/2] ui: dc/options: add auto rebalancing options Daniel Kral
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DGZ1E90K0WCS.2Y9ZYYI3TD7R3@proxmox.com \
--to=d.kral@proxmox.com \
--cc=d.rusovac@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox