public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Daniel Kral" <d.kral@proxmox.com>
To: "Dominik Rusovac" <d.rusovac@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
Date: Thu, 26 Mar 2026 15:11:28 +0100	[thread overview]
Message-ID: <DHCRWM3F526R.13E4LJL187H4K@proxmox.com> (raw)
In-Reply-To: <DHCNATRHTX0J.1X3B76NP84GES@proxmox.com>

On Thu Mar 26, 2026 at 11:34 AM CET, Dominik Rusovac wrote:
> lgtm 
>
> pls find my comments inline, mostly relating to nits or tiny things
>
> On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
>> Assuming that a resource will hold the same dynamic resource usage on a
>> new node as on the previous node, score possible migrations, where:
>>
>> - the cluster node imbalance is minimal (bruteforce), or
>> - the shifted root mean square and maximum resource usages of the cpu
>>   and memory is minimal across the cluster nodes (TOPSIS).
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> changes v1 -> v2:
>> - add saturating_sub() in remove_running_resource(...) (as suggested by
>>   @Thomas)
>> - slightly move declarations and impls around so that reading from
>>   top-to-bottom is a little easier
>> - pass NodeUsage vec instead of NodeStats vec to
>>   calculate_node_imbalance(...)
>> - pass a closure to calculate_node_imbalance(...) (as suggested by
>>   @Dominik)
>> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
>>   struct is now ordered first by the imbalance and then the strings in
>>   the `Migration` struct
>> - fix floating-point issue for the imbalance ordering for
>>   ScoredMigration
>> - correctly implement `Ord` (essentially removing the reverse() and
>>   moving these Reverse() wrappers to the usages for the BinaryHeap)
>> - use the `Migration` struct in `MigrationCandidate` as well
>> - drop Scheduler::node_stats() as it's unused now
>> - use Vec::with_capacity(...) where possible
>> - eagerly implement common traits (especially Clone and Debug)
>> - add test cases for the ScoredMigration ordering, node imbalance
>>   calculation and the two rebalancing migration scoring methods
>> - s/score_best_balancing_migrations
>>    /score_best_balancing_migration_candidates
>>   to possibly allow the Scheduler/Usage impls handling the migration
>>   candidate generation in the future instead of the callers
>
> [snip]
>
>> +/// Returns the load imbalance among the nodes.
>> +///
>> +/// The load balance is measured as the statistical dispersion of the individual node loads.
>> +///
>> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
>> +/// standard deviation in relation to the average mean of the node loads.
>> +///
>> +/// The coefficient of variation is not robust, which is a desired property here, because outliers
>> +/// should be detected as much as possible.
>> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
>
> very nice docs! 
>
> [snip]
>
>> +/// A possible migration.
>> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
>> +#[serde(rename_all = "kebab-case")]
>> +pub struct Migration {
>> +    /// The identifier of a leading resource.
>> +    pub sid: String,
>> +    /// The current node of the leading resource.
>> +    pub source_node: String,
>> +    /// The possible migration target node for the resource.
>> +    pub target_node: String,
>
> nit: on the long run, instead of having `ScoredMigration`, 
> it could be more convenient to have a field:
>
>     pub imbalance: Option<f64>,
>

Might make sense, but then we can't reuse the same structure in
`MigrationCandidate` anymore, so I would let ScoredMigration be it's own
type, what do you think?

>> +}
>
> [snip]
>
>> +impl ScoredMigration {
>> +    pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
>> +        // Depending how the imbalance is calculated, it can contain minor approximation errors. As
>
>            // Depending [on] how [...]

Thanks, will change that!

>
>> +        // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
>> +        // where the imbalance is the same up to the significant digits in base 10, but treated as
>> +        // different values.
>> +        //
>> +        // Therefore, truncate any non-significant digits to prevent these cases.
>> +        let factor = 10_f64.powf(f64::DIGITS as f64);
>> +        let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
>
> Nice solution, this appears to be a clean approach to achieve deterministic `Ord`
> for `f64`. 
>
> One small thing, tho: `f64::DIGITS` is technically not a floating number, but `15_u32`.
>
>            let factor = 10_f64.powi(f64::DIGITS as i32);
>
> thus, seems to be the better choice here. `powi` is also generally faster than `powf` [0]. 
>
> [0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=Using%20this%20function%20is%20generally%20faster%20than%20using%20powf

Thanks, good catch! I also weren't aware of them being non-deterministic
here, that's good to know.

We briefly also talked about this off-list, I'll adapt the test cases
below to not eq() the ScoredMigration directly as the truncation here
might be non-deterministic (and I assumed otherwise).

More-so, it's more important to verify the order in which the migrations
are scored than their exact imbalance score as you suggested.

>
> [snip]
>
>> +/// A possible migration candidate with the migrated usage stats.
>> +#[derive(Clone, Debug)]
>> +pub struct MigrationCandidate {
>> +    /// The possible migration.
>> +    pub migration: Migration,
>> +    /// The to-be-migrated resource usage stats.
>
> imo, easier to comprehend: 
>
>     /// Usage stats of the resource to be migrated

Nice, will change it to that!

>
>> +    pub stats: ResourceStats,
>> +}
>
> [snip]





  reply	other threads:[~2026-03-26 14:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral [this message]
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCRWM3F526R.13E4LJL187H4K@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=d.rusovac@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal