public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
Date: Tue, 24 Mar 2026 19:30:00 +0100	[thread overview]
Message-ID: <20260324183029.1274972-17-d.kral@proxmox.com> (raw)
In-Reply-To: <20260324183029.1274972-1-d.kral@proxmox.com>

These methods expose the auto rebalancing methods of both the static and
dynamic scheduler.

As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
takes a possible very large list of migration candidates, the binding
takes a more compact representation, which reduces the size that needs
to be generated on the caller's side and therefore the runtime of the
serialization from Perl to Rust.

Additionally, while decomposing the compact representation the input
data is validated since the underlying scoring methods do not further
validate whether their input is consistent with the cluster usage.

The method names score_best_balancing_migration_candidates{,_topsis}()
are chosen deliberately, so that future extensions can implement
score_best_balancing_migrations{,_topsis}(), which might allow to score
migrations without providing the candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- improve patch message and documentation
- move to the end of the perl-rs changes, which makes it more consistent
  with the change order in pve-ha-manager as well
- uses `UsageAggregator` now to discern how usages are accumulated
- s/generate_migration_candidates_from
   /decompose_compact_migration_candidates
- make the decomposition of compact migration candidates more robust and
  do not use any unwraps or other causes of panic but the Mutex guard
  unwrap

 .../resource_scheduling/pve_dynamic.rs        | 57 +++++++++++-
 .../resource_scheduling/pve_static.rs         | 56 +++++++++++-
 .../bindings/resource_scheduling/resource.rs  | 88 ++++++++++++++++++-
 .../src/bindings/resource_scheduling/usage.rs | 15 ++++
 4 files changed, 211 insertions(+), 5 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
index 5b4373e..26f36d1 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -14,10 +14,15 @@ pub mod pve_rs_resource_scheduling_dynamic {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
-    use crate::bindings::resource_scheduling::resource::PveResource;
-    use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+    use crate::bindings::resource_scheduling::resource::{
+        CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+    };
+    use crate::bindings::resource_scheduling::usage::{
+        IdentityAggregator, StartingAsStartedResourceAggregator,
+    };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
 
@@ -157,6 +162,54 @@ pub mod pve_rs_resource_scheduling_dynamic {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage.to_scheduler::<IdentityAggregator>().node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index e2756db..7924889 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -14,10 +14,14 @@ pub mod pve_rs_resource_scheduling_static {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
     use crate::bindings::resource_scheduling::{
-        resource::PveResource, usage::StartedResourceAggregator,
+        resource::{
+            CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+        },
+        usage::StartedResourceAggregator,
     };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
@@ -154,6 +158,56 @@ pub mod pve_rs_resource_scheduling_static {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
index 91d56b9..9186d5b 100644
--- a/pve-rs/src/bindings/resource_scheduling/resource.rs
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -1,6 +1,8 @@
 use anyhow::{Error, bail};
-use proxmox_resource_scheduling::resource::{
-    Resource, ResourcePlacement, ResourceState, ResourceStats,
+use proxmox_resource_scheduling::{
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    scheduler::{Migration, MigrationCandidate},
+    usage::Usage,
 };
 
 use serde::{Deserialize, Serialize};
@@ -42,3 +44,85 @@ impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
         Ok(Resource::new(resource.stats.into(), state, placement))
     }
 }
+
+/// A compact representation of [`proxmox_resource_scheduling::scheduler::MigrationCandidate`].
+#[derive(Serialize, Deserialize)]
+pub struct CompactMigrationCandidate {
+    /// The identifier of the leading resource.
+    pub leader: String,
+    /// The resources which are part of the leading resource's bundle.
+    pub resources: Vec<String>,
+    /// The nodes, which are possible to migrate to for the resources.
+    pub nodes: Vec<String>,
+}
+
+/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
+/// usage from `usage`.
+///
+/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
+///
+/// - the `leader` is not present in the cluster usage
+/// - the `leader` is non-stationary
+/// - any resource in `resources` is not present in the cluster usage
+/// - any resource in `resources` is non-stationary
+/// - any resource in `resources` is on another node than the `leader`
+pub(crate) fn decompose_compact_migration_candidates(
+    usage: &Usage,
+    compact_candidates: Vec<CompactMigrationCandidate>,
+) -> Result<Vec<MigrationCandidate>, Error> {
+    // The length of `compact_candidates` is at least a lower bound
+    let mut candidates = Vec::with_capacity(compact_candidates.len());
+
+    for candidate in compact_candidates.into_iter() {
+        let leader_sid = candidate.leader;
+        let leader = match usage.get_resource(&leader_sid) {
+            Some(resource) => resource,
+            _ => bail!("leader '{leader_sid}' is not present in the cluster usage"),
+        };
+        let leader_node = match leader.placement() {
+            ResourcePlacement::Stationary { current_node } => current_node,
+            _ => bail!("leader '{leader_sid}' is non-stationary"),
+        };
+
+        if !candidate.resources.contains(&leader_sid) {
+            bail!("leader '{leader_sid}' is not present in the resources list");
+        }
+
+        let mut resource_stats = Vec::with_capacity(candidate.resources.len());
+
+        for sid in candidate.resources.iter() {
+            let resource = match usage.get_resource(sid) {
+                Some(resource) => resource,
+                _ => bail!("resource '{sid}' is not present in the cluster usage"),
+            };
+
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    if current_node != leader_node {
+                        bail!("resource '{sid}' is on other node than leader");
+                    }
+
+                    resource_stats.push(resource.stats());
+                }
+                _ => bail!("resource '{sid}' is non-stationary"),
+            }
+        }
+
+        let bundle_stats = resource_stats.into_iter().sum();
+
+        for target_node in candidate.nodes.into_iter() {
+            let migration = Migration {
+                sid: leader_sid.to_string(),
+                source_node: leader_node.to_string(),
+                target_node,
+            };
+
+            candidates.push(MigrationCandidate {
+                migration,
+                stats: bundle_stats,
+            });
+        }
+    }
+
+    Ok(candidates)
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index 87b7e3e..48f6e84 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -4,6 +4,21 @@ use proxmox_resource_scheduling::{
     usage::{Usage, UsageAggregator},
 };
 
+/// The identity aggregator, which passes the node stats as-is.
+pub(crate) struct IdentityAggregator;
+
+impl UsageAggregator for IdentityAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| NodeUsage {
+                name: nodename.to_string(),
+                stats: node.stats(),
+            })
+            .collect()
+    }
+}
+
 /// An aggregator, which adds any resource as a started resource.
 ///
 /// This aggregator is useful if the node base stats do not have any current usage.
-- 
2.47.3





  parent reply	other threads:[~2026-03-24 18:32 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` Daniel Kral [this message]
2026-03-27 14:16   ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324183029.1274972-17-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal