From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 877771FF144 for ; Tue, 10 Mar 2026 11:40:36 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A05CB181DB; Tue, 10 Mar 2026 11:40:28 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 10 Mar 2026 11:40:21 +0100 Message-Id: Subject: Re: [RFC proxmox 4/5] resource-scheduling: implement rebalancing migration selection From: "Daniel Kral" To: "Dominik Rusovac" , X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260217141437.584852-1-d.kral@proxmox.com> <20260217141437.584852-5-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1773139188129 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.034 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: PSJXCSZ2ES2GK6QC665MUPH7BGDI42YC X-Message-ID-Hash: PSJXCSZ2ES2GK6QC665MUPH7BGDI42YC X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Thanks a lot for taking the time to review this! I've added some inline notes, though most of them are just ACKs. On Mon Mar 9, 2026 at 2:32 PM CET, Dominik Rusovac wrote: > Nice work! Here are my thoughts and some things I noticed. > > On Tue Feb 17, 2026 at 3:13 PM CET, Daniel Kral wrote: >> Assuming that a service will hold the same dynamic resource usage on a >> new node as on the previous node, score possible migrations, where: >> >> - the cluster node imbalance is minimal (bruteforce), or >> >> - the shifted root mean square and maximum resource usages of the cpu >> and memory is minimal across the cluster nodes (TOPSIS). >> >> Signed-off-by: Daniel Kral >> --- >> score_best_balancing_migrations() and select_best_balancing_migration() >> are separate because there could be future improvements for the single >> select, but might be unnecessary and redundant (especially since we need >> to expose it at perlmod and PVE::HA::Usage::{Dynamic,Static} twice). > > Regarding future improvements for the "single select", providing an > `imbalance_threshold` to use for early termination may pay off > performance-wise, e.g., something along the lines of: The main idea to limit it in size is because of the serialization/deserialization to Perl values, but writing it out makes it clear that this should probably be the responsibility of the caller (i.e., the perlmod bindings). I might consider dropping the select_best_balancing_migration(...) here entirely as we can still add it later on and use score_best_balancing_migrations(..., limit=3D1) anywhere else now... > > pub fn select_best_balancing_migration( > &self, > candidates: I, > imbalance_threshold: Option > ) -> Option > where > I: IntoIterator, > { > let threshold =3D imbalance_threshold.unwrap_or(0.0); > > let mut scored_migrations =3D BinaryHeap::new(); > for candidate in candidates { > let imbalance =3D self.node_imbalance_with_migration(&can= didate); > > let migration =3D ScoredMigration { > migration: candidate.into(), > imbalance, > }; > > if imbalance <=3D threshold { > return Some(migration); > } Hm, even though this could be nice, I'm not so sure about it as setting a threshold externally might not be that easy and this will not consider any later possibly better migration candidates? > > scored_migrations.push(migration); > } > > scored_migrations.pop() > } >> > > [snip]=20 > >> +fn calculate_node_loads(nodes: &[NodeStats]) -> Vec { >> + nodes.iter().map(|stats| stats.load()).collect() >> +} >> + >> +/// Returns the load imbalance among the nodes. >> +/// >> +/// The load balance is measured as the statistical dispersion of the i= ndividual node loads. >> +/// >> +/// The current implementation uses the dimensionless coefficient of va= riation, which expresses the >> +/// standard deviation in relation to the average mean of the node load= s. Additionally, the >> +/// coefficient of variation is not robust, which is > > Parts of the docs are missing, it seems. Right, will fix that in a v2! > >> +fn calculate_node_imbalance(nodes: &[NodeStats]) -> f64 { >> + let node_count =3D nodes.len(); >> + let node_loads =3D calculate_node_loads(nodes); >> + >> + let load_sum =3D node_loads >> + .iter() >> + .fold(0.0, |sum, node_load| sum + node_load); > > To increase readability, consider: > =20 > let load_sum =3D node_loads.iter().sum::(); Will do! I still want to document what Dominik pointed out to me off-list: Even though that Iterator::sum() will return -0.0 for iterators that yield nothing, but the following `load_sum =3D=3D 0.0` test will still hold true as -0.0 =3D=3D 0.0 in IEEE754 [0]. [0] https://en.wikipedia.org/wiki/IEEE_754#Comparison_predicates > >> + >> + // load_sum is guaranteed to be 0.0 for empty nodes >> + if load_sum =3D=3D 0.0 { >> + 0.0 >> + } else { >> + let load_mean =3D load_sum / node_count as f64; >> + >> + let squared_diff_sum =3D node_loads >> + .iter() >> + .fold(0.0, |sum, node_load| sum + (node_load - load_mean).p= owi(2)); >> + let load_sd =3D (squared_diff_sum / node_count as f64).sqrt(); >> + >> + load_sd / load_mean >> + } >> } > > [0] Considering where this function is used, expecting a > designated closure that is used to map `NodeUsage` onto `f64` seems=20 > to be more convenient, e.g.: > > fn calculate_node_imbalance( > nodes: &[NodeUsage], > to_load: impl FnMut(&NodeUsage) -> f64, > ) -> f64 { > let node_count =3D nodes.len(); > let node_loads =3D nodes.iter().map(to_load).collect::>(); > > // ... > } > Great idea! I'll change it to this in the v2! >> =20 > > [snip] > >> + fn node_stats(&self) -> Vec { >> + self.nodes.iter().map(|node| node.stats).collect() >> + } >> + >> + /// Returns the individual node loads. >> + pub fn node_loads(&self) -> Vec<(String, f64)> { >> + self.nodes >> + .iter() >> + .map(|node| (node.name.to_string(), node.stats.load())) >> + .collect() >> + } >> + >> + /// Returns the load imbalance among the nodes. >> + /// >> + /// See [`calculate_node_imbalance`] for more information. >> + pub fn node_imbalance(&self) -> f64 { >> + let node_stats =3D self.node_stats(); >> + >> + calculate_node_imbalance(&node_stats) >> + } > > Regarding [0], passing a closure compresses two iterations into one, e.g.= : > > pub fn node_imbalance(&self) -> f64 { > calculate_node_imbalance(&self.nodes, |node| node.stats.load()) > } > ACK >> + >> + /// Returns the load imbalance among the nodes as if a specific ser= vice was moved. >> + /// >> + /// See [`calculate_node_imbalance`] for more information. >> + pub fn node_imbalance_with_migration(&self, migration: &MigrationCa= ndidate) -> f64 { >> + let mut new_node_stats =3D Vec::with_capacity(self.nodes.len())= ; >> + >> + self.nodes.iter().for_each(|node| { >> + let mut new_stats =3D node.stats; >> + >> + if node.name =3D=3D migration.source_node { >> + new_stats.remove_running_service(&migration.stats); >> + } else if node.name =3D=3D migration.target_node { >> + new_stats.add_running_service(&migration.stats); >> + } >> + >> + new_node_stats.push(new_stats); >> + }); >> + >> + calculate_node_imbalance(&new_node_stats) >> + } > > Same here regarding [0], e.g.: > > pub fn node_imbalance_with_migration(&self, migration: &MigrationCand= idate) -> f64 { > calculate_node_imbalance_some(&self.nodes, |node| { > let mut new_stats =3D node.stats; > > if node.name =3D=3D migration.source_node { > new_stats.remove_running_service(&migration.stats); > } else if node.name =3D=3D migration.target_node { > new_stats.add_running_service(&migration.stats); > } > > new_stats.load() > }) > } > ACK, nice! >> + >> + /// Score the service motions by the best node imbalance improvemen= t with exhaustive search. >> + pub fn score_best_balancing_migrations( >> + &self, >> + candidates: I, >> + limit: usize, >> + ) -> Result, Error> > > Does this really have to return a `Result`? It uses no try operator and > provides no error handling. No it does not, seems like it was more out of habit than necessity. Will remove the Result here in a v2! > >> + where >> + I: IntoIterator, >> + { >> + let mut scored_migrations =3D candidates >> + .into_iter() >> + .map(|candidate| { >> + let imbalance =3D self.node_imbalance_with_migration(&c= andidate); >> + >> + ScoredMigration { >> + migration: candidate.into(), >> + imbalance, >> + } >> + }) >> + .collect::>(); >> + >> + let mut best_alternatives =3D Vec::new(); > > Since `limit` declares the capacity of `best_alternatives`, consider: > > let mut best_alternatives =3D Vec::with_capacity(limit); Thanks, ACK! > >> + >> + // BinaryHeap::into_iter_sorted() is still in nightly unfortuna= tely >> + while best_alternatives.len() < limit { >> + match scored_migrations.pop() { >> + Some(alternative) =3D> best_alternatives.push(alternati= ve), >> + None =3D> break, >> + } >> + } >> + >> + Ok(best_alternatives) >> + }