From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 6A7821FF13F for ; Thu, 26 Mar 2026 15:16:30 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id DA9FF1569F; Thu, 26 Mar 2026 15:16:52 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 26 Mar 2026 15:16:17 +0100 Message-Id: To: "Dominik Rusovac" , Subject: Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation From: "Daniel Kral" X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260324183029.1274972-1-d.kral@proxmox.com> <20260324183029.1274972-5-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774534528651 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.058 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: IZMWO77EV6YP46HUXZ64TQTCFCNBVVXH X-Message-ID-Hash: IZMWO77EV6YP46HUXZ64TQTCFCNBVVXH X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu Mar 26, 2026 at 11:19 AM CET, Dominik Rusovac wrote: > [snip] > > good to have:=20 > > #[derive(Clone, Debug)] ACK, thanks! > >> +pub struct Scheduler { >> + nodes: Vec, >> +} >> + > > nit: The implementation of `Scheduler` is totally fine as-is. This is > just my two cents, as this was mentioned off-list.=20 > > I believe that for the implementation of the scheduler working with > enum variants and a trait and then exploiting static dispatch is more > convenient and easier to maintain, e.g.: > > pub enum Schedulerr { > Topsis(Nodes), > BruteForce(Nodes), > } > =20 > pub trait Decide { > fn node_imbalance(&self) -> f64; > =20 > fn node_imbalance_with_migration_candidate(&self, candidate: &Mig= rationCandidate) -> f64; > =20 > fn score_best_balancing_migration_candidates( > &self, > candidates: &[MigrationCandidate], > limit: usize, > ) -> Result, Error>; > } > =20 > impl Decide for Schedulerr> { [...] > } > =20 > pub fn score_best_balancing_migration_candidates( > scheduler: &impl Decide, > candidates: &[MigrationCandidate], > limit: usize, > ) -> Result, Error> { > scheduler.score_best_balancing_migration_candidates(candidates, l= imit) > } > > In a nutshell, this declares what a scheduler ought to be able to do to b= e used > for scoring (that is, implementing the `Decide` trait); and implements > all the functionality for all the variants in one place. =20 > > Nice side-effects of this design:=20 > * one scoring function implements all the variants in one place, which is= nice, I think > * adding/removing a scheduler variant would become more systematic > * modifying scheduler variants in terms of how they score or, for example= , how they measure > imbalance, would also be more straightforward > > Again, just my two cents. As discussed off-list, I like how readable the different method alternatives are with the pattern matching and that makes maintenance and reading diffs easier. Though I'm not sure whether it is a good idea to define the `Scheduler` by the algorithms that one or more of their methods use. Choosing one algorithm might only be something we want in the short-time, e.g., whether TOPSIS or bruteforce is the right fit for users, and might drop bruteforce or TOPSIS for these methods in the future or might add another method, which uses another new algorithm. Then it might be better that the methods themselves, such as score_nodes_to_start_resource() and score_best_balancing_migration_candidates() defines which algorithm these use. In that case it is only coincidental that two methods use the same algorithm internally. Maybe we could still go for some parameter that allows passing the preferred algorithm for score_best_balancing_migration_candidates(), which now can either be Bruteforce or Topsis for the latter? Still, thanks a lot for the suggestion! I like the idea a lot, but I'm only a little unsure whether it is the right fit in this situation wrt. to future changes. > >> +impl Scheduler { >> + /// Instantiate scheduler instance from node usages. >> + pub fn from_nodes(nodes: I) -> Self >> + where >> + I: IntoIterator>, >> + { >> + Self { >> + nodes: nodes.into_iter().map(|node| node.into()).collect(), >> + } >> + } >> + >> + /// Scores nodes to start a resource with the usage statistics `res= ource_stats` on. >> + /// >> + /// The scoring is done as if the resource is already started on ea= ch node. This assumes that >> + /// the already started resource consumes the maximum amount of eac= h stat according to its >> + /// `resource_stats`. >> + /// >> + /// Returns a vector of (nodename, score) pairs. Scores are between= 0.0 and 1.0 and a higher >> + /// score is better. >> + pub fn score_nodes_to_start_resource>( >> + &self, >> + resource_stats: T, >> + ) -> Result, Error> { >> + let len =3D self.nodes.len(); >> + let resource_stats =3D resource_stats.into(); >> + >> + let matrix =3D self >> + .nodes >> + .iter() >> + .enumerate() >> + .map(|(target_index, _)| { >> + // Base values on percentages to allow comparing nodes = with different stats. >> + let mut highest_cpu =3D 0.0; >> + let mut squares_cpu =3D 0.0; >> + let mut highest_mem =3D 0.0; >> + let mut squares_mem =3D 0.0; >> + >> + for (index, node) in self.nodes.iter().enumerate() { >> + let mut new_stats =3D node.stats; >> + >> + if index =3D=3D target_index { >> + new_stats.add_started_resource(&resource_stats) >> + }; >> + >> + let new_cpu =3D new_stats.cpu_load(); >> + highest_cpu =3D f64::max(highest_cpu, new_cpu); >> + squares_cpu +=3D new_cpu.powi(2); >> + >> + let new_mem =3D new_stats.mem_load(); >> + highest_mem =3D f64::max(highest_mem, new_mem); >> + squares_mem +=3D new_mem.powi(2); >> + } >> + >> + // Add 1.0 to avoid boosting tiny differences: e.g. 0.0= 04 is twice as much as 0.002, but >> + // 1.004 is only slightly more than 1.002. >> + PveTopsisAlternative { >> + average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(= ), >> + highest_cpu: 1.0 + highest_cpu, >> + average_memory: 1.0 + (squares_mem / len as f64).sq= rt(), >> + highest_memory: 1.0 + highest_mem, >> + } >> + .into() >> + }) >> + .collect::>(); >> + >> + let scores =3D >> + topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &= PVE_HA_TOPSIS_CRITERIA)?; >> + >> + Ok(scores >> + .into_iter() >> + .enumerate() >> + .map(|(n, score)| (self.nodes[n].name.to_string(), score)) >> + .collect()) >> + } >> } > > [snip] > > in the future, the proptest crate could come in handy for such helper > functions.=20 > > in general, I would propose to add proptests in the future. > > for some examples and inspiration, see [0]. > > [0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox= .com/T/ +1 > >> +fn new_homogeneous_cluster_scheduler() -> Scheduler { >> + let (maxcpu, maxmem) =3D (16, 64 * (1 << 30)); >> + >> + let node1 =3D NodeUsage { >> + name: String::from("node1"), >> + stats: NodeStats { >> + cpu: 1.7, >> + maxcpu, >> + mem: 12334 << 20, >> + maxmem, >> + }, >> + }; >> + >> + let node2 =3D NodeUsage { >> + name: String::from("node2"), >> + stats: NodeStats { >> + cpu: 15.184, >> + maxcpu, >> + mem: 529 << 20, >> + maxmem, >> + }, >> + }; >> + >> + let node3 =3D NodeUsage { >> + name: String::from("node3"), >> + stats: NodeStats { >> + cpu: 5.2, >> + maxcpu, >> + mem: 9381 << 20, >> + maxmem, >> + }, >> + }; >> + >> + Scheduler::from_nodes(vec![node1, node2, node3]) >> +} >> + >> +fn new_heterogeneous_cluster_scheduler() -> Scheduler { >> + let node1 =3D NodeUsage { >> + name: String::from("node1"), >> + stats: NodeStats { >> + cpu: 1.7, >> + maxcpu: 16, >> + mem: 12334 << 20, >> + maxmem: 128 << 30, >> + }, >> + }; >> + >> + let node2 =3D NodeUsage { >> + name: String::from("node2"), >> + stats: NodeStats { >> + cpu: 15.184, >> + maxcpu: 32, >> + mem: 529 << 20, >> + maxmem: 96 << 30, >> + }, >> + }; >> + >> + let node3 =3D NodeUsage { >> + name: String::from("node3"), >> + stats: NodeStats { >> + cpu: 5.2, >> + maxcpu: 24, >> + mem: 9381 << 20, >> + maxmem: 64 << 30, >> + }, >> + }; >> + >> + Scheduler::from_nodes(vec![node1, node2, node3]) >> +} > > [snip] > >> +#[test] >> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error= > { >> + let scheduler =3D new_homogeneous_cluster_scheduler(); >> + >> + let heavy_memory_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 1.0, >> + mem: 0, >> + maxmem: 12 << 30, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_= stats)?, >> + vec!["node2", "node3", "node1"] >> + ); >> + >> + let heavy_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 12.0, >> + mem: 0, >> + maxmem: 0, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_sta= ts)?, >> + vec!["node1", "node3", "node2"] >> + ); >> + >> + let unlimited_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 0.0, >> + mem: 0, >> + maxmem: 0, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource= _stats)?, >> + vec!["node1", "node3", "node2"] >> + ); >> + > > nit: confusing variable name ACK this and the following, that was a copy-paste error unfortunately. > >> + let unlimited_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 12.0, >> + mem: 0, >> + maxmem: 12 << 30, >> + }; >> + >> + assert_eq!( > > nit: confusing variable name ACK > >> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource= _stats)?, >> + vec!["node2", "node3", "node1"] >> + ); >> + >> + Ok(()) >> +} >> + >> +#[test] >> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Err= or> { >> + let scheduler =3D new_heterogeneous_cluster_scheduler(); >> + >> + let heavy_memory_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 1.0, >> + mem: 0, >> + maxmem: 12 << 30, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_= stats)?, >> + vec!["node2", "node1", "node3"] >> + ); >> + >> + let heavy_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 12.0, >> + mem: 0, >> + maxmem: 0, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_sta= ts)?, >> + vec!["node3", "node2", "node1"] >> + ); >> + >> + let unlimited_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 0.0, >> + mem: 0, >> + maxmem: 0, >> + }; >> + >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource= _stats)?, >> + vec!["node1", "node3", "node2"] >> + ); >> + > > nit: confusing variable name > ACK >> + let unlimited_cpu_resource_stats =3D ResourceStats { >> + cpu: 0.0, >> + maxcpu: 12.0, >> + mem: 0, >> + maxmem: 12 << 30, >> + }; >> + > > nit: confusing variable name > ACK >> + assert_eq!( >> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource= _stats)?, >> + vec!["node2", "node1", "node3"] >> + ); >> + >> + Ok(()) >> +}