public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Daniel Kral" <d.kral@proxmox.com>
To: "Dominik Rusovac" <d.rusovac@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
Date: Thu, 26 Mar 2026 15:16:17 +0100	[thread overview]
Message-ID: <DHCS0AP47PZO.3K98MYBZB6RYI@proxmox.com> (raw)
In-Reply-To: <DHCMYPYGMY0I.179IM7YU0XVMC@proxmox.com>

On Thu Mar 26, 2026 at 11:19 AM CET, Dominik Rusovac wrote:
> [snip]
>
> good to have: 
>
>     #[derive(Clone, Debug)]

ACK, thanks!

>
>> +pub struct Scheduler {
>> +    nodes: Vec<NodeUsage>,
>> +}
>> +
>
> nit: The implementation of `Scheduler` is totally fine as-is.  This is
> just my two cents, as this was mentioned off-list. 
>
> I believe that for the implementation of the scheduler working with
> enum variants and a trait and then exploiting static dispatch is more
> convenient and easier to maintain, e.g.:
>
>     pub enum Schedulerr<Nodes> {
>         Topsis(Nodes),
>         BruteForce(Nodes),
>     }
>     
>     pub trait Decide {
>         fn node_imbalance(&self) -> f64;
>     
>         fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64;
>     
>         fn score_best_balancing_migration_candidates(
>             &self,
>             candidates: &[MigrationCandidate],
>             limit: usize,
>         ) -> Result<Vec<ScoredMigration>, Error>;
>     }
>     
>     impl Decide for Schedulerr<Vec<NodeUsage>> {

[...]

>     }
>     
>     pub fn score_best_balancing_migration_candidates(
>         scheduler: &impl Decide,
>         candidates: &[MigrationCandidate],
>         limit: usize,
>     ) -> Result<Vec<ScoredMigration>, Error> {
>         scheduler.score_best_balancing_migration_candidates(candidates, limit)
>     }
>
> In a nutshell, this declares what a scheduler ought to be able to do to be used
> for scoring (that is, implementing the `Decide` trait); and implements
> all the functionality for all the variants in one place.  
>
> Nice side-effects of this design: 
> * one scoring function implements all the variants in one place, which is nice, I think
> * adding/removing a scheduler variant would become more systematic
> * modifying scheduler variants in terms of how they score or, for example, how they measure
>   imbalance, would also be more straightforward
>
> Again, just my two cents.

As discussed off-list, I like how readable the different method
alternatives are with the pattern matching and that makes maintenance
and reading diffs easier.

Though I'm not sure whether it is a good idea to define the `Scheduler`
by the algorithms that one or more of their methods use. Choosing one
algorithm might only be something we want in the short-time, e.g.,
whether TOPSIS or bruteforce is the right fit for users, and might drop
bruteforce or TOPSIS for these methods in the future or might add
another method, which uses another new algorithm.

Then it might be better that the methods themselves, such as
score_nodes_to_start_resource() and
score_best_balancing_migration_candidates() defines which algorithm
these use. In that case it is only coincidental that two methods use the
same algorithm internally.

Maybe we could still go for some parameter that allows passing the
preferred algorithm for score_best_balancing_migration_candidates(),
which now can either be Bruteforce or Topsis for the latter?

Still, thanks a lot for the suggestion! I like the idea a lot, but I'm
only a little unsure whether it is the right fit in this situation wrt.
to future changes.

>
>> +impl Scheduler {
>> +    /// Instantiate scheduler instance from node usages.
>> +    pub fn from_nodes<I>(nodes: I) -> Self
>> +    where
>> +        I: IntoIterator<Item: Into<NodeUsage>>,
>> +    {
>> +        Self {
>> +            nodes: nodes.into_iter().map(|node| node.into()).collect(),
>> +        }
>> +    }
>> +
>> +    /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
>> +    ///
>> +    /// The scoring is done as if the resource is already started on each node. This assumes that
>> +    /// the already started resource consumes the maximum amount of each stat according to its
>> +    /// `resource_stats`.
>> +    ///
>> +    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
>> +    /// score is better.
>> +    pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
>> +        &self,
>> +        resource_stats: T,
>> +    ) -> Result<Vec<(String, f64)>, Error> {
>> +        let len = self.nodes.len();
>> +        let resource_stats = resource_stats.into();
>> +
>> +        let matrix = self
>> +            .nodes
>> +            .iter()
>> +            .enumerate()
>> +            .map(|(target_index, _)| {
>> +                // Base values on percentages to allow comparing nodes with different stats.
>> +                let mut highest_cpu = 0.0;
>> +                let mut squares_cpu = 0.0;
>> +                let mut highest_mem = 0.0;
>> +                let mut squares_mem = 0.0;
>> +
>> +                for (index, node) in self.nodes.iter().enumerate() {
>> +                    let mut new_stats = node.stats;
>> +
>> +                    if index == target_index {
>> +                        new_stats.add_started_resource(&resource_stats)
>> +                    };
>> +
>> +                    let new_cpu = new_stats.cpu_load();
>> +                    highest_cpu = f64::max(highest_cpu, new_cpu);
>> +                    squares_cpu += new_cpu.powi(2);
>> +
>> +                    let new_mem = new_stats.mem_load();
>> +                    highest_mem = f64::max(highest_mem, new_mem);
>> +                    squares_mem += new_mem.powi(2);
>> +                }
>> +
>> +                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
>> +                // 1.004 is only slightly more than 1.002.
>> +                PveTopsisAlternative {
>> +                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
>> +                    highest_cpu: 1.0 + highest_cpu,
>> +                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
>> +                    highest_memory: 1.0 + highest_mem,
>> +                }
>> +                .into()
>> +            })
>> +            .collect::<Vec<_>>();
>> +
>> +        let scores =
>> +            topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
>> +
>> +        Ok(scores
>> +            .into_iter()
>> +            .enumerate()
>> +            .map(|(n, score)| (self.nodes[n].name.to_string(), score))
>> +            .collect())
>> +    }
>>  }
>
> [snip]
>
> in the future, the proptest crate could come in handy for such helper
> functions. 
>
> in general, I would propose to add proptests in the future.
>
> for some examples and inspiration, see [0].
>
> [0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox.com/T/

+1

>
>> +fn new_homogeneous_cluster_scheduler() -> Scheduler {
>> +    let (maxcpu, maxmem) = (16, 64 * (1 << 30));
>> +
>> +    let node1 = NodeUsage {
>> +        name: String::from("node1"),
>> +        stats: NodeStats {
>> +            cpu: 1.7,
>> +            maxcpu,
>> +            mem: 12334 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    let node2 = NodeUsage {
>> +        name: String::from("node2"),
>> +        stats: NodeStats {
>> +            cpu: 15.184,
>> +            maxcpu,
>> +            mem: 529 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    let node3 = NodeUsage {
>> +        name: String::from("node3"),
>> +        stats: NodeStats {
>> +            cpu: 5.2,
>> +            maxcpu,
>> +            mem: 9381 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>> +
>> +fn new_heterogeneous_cluster_scheduler() -> Scheduler {
>> +    let node1 = NodeUsage {
>> +        name: String::from("node1"),
>> +        stats: NodeStats {
>> +            cpu: 1.7,
>> +            maxcpu: 16,
>> +            mem: 12334 << 20,
>> +            maxmem: 128 << 30,
>> +        },
>> +    };
>> +
>> +    let node2 = NodeUsage {
>> +        name: String::from("node2"),
>> +        stats: NodeStats {
>> +            cpu: 15.184,
>> +            maxcpu: 32,
>> +            mem: 529 << 20,
>> +            maxmem: 96 << 30,
>> +        },
>> +    };
>> +
>> +    let node3 = NodeUsage {
>> +        name: String::from("node3"),
>> +        stats: NodeStats {
>> +            cpu: 5.2,
>> +            maxcpu: 24,
>> +            mem: 9381 << 20,
>> +            maxmem: 64 << 30,
>> +        },
>> +    };
>> +
>> +    Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>
> [snip]
>
>> +#[test]
>> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> +    let scheduler = new_homogeneous_cluster_scheduler();
>> +
>> +    let heavy_memory_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 1.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> +        vec!["node2", "node3", "node1"]
>> +    );
>> +
>> +    let heavy_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 0.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>
> nit: confusing variable name

ACK this and the following, that was a copy-paste error unfortunately.

>
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>
> nit: confusing variable name

ACK

>
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node2", "node3", "node1"]
>> +    );
>> +
>> +    Ok(())
>> +}
>> +
>> +#[test]
>> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> +    let scheduler = new_heterogeneous_cluster_scheduler();
>> +
>> +    let heavy_memory_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 1.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> +        vec!["node2", "node1", "node3"]
>> +    );
>> +
>> +    let heavy_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> +        vec!["node3", "node2", "node1"]
>> +    );
>> +
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 0.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>
> nit: confusing variable name
>

ACK

>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>
> nit: confusing variable name
>

ACK

>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node2", "node1", "node3"]
>> +    );
>> +
>> +    Ok(())
>> +}





  reply	other threads:[~2026-03-26 14:16 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral [this message]
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCS0AP47PZO.3K98MYBZB6RYI@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=d.rusovac@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal