From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id B57261FF13F for ; Thu, 26 Mar 2026 11:35:10 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 91950CF67; Thu, 26 Mar 2026 11:35:32 +0100 (CET) Content-Type: text/plain; charset=UTF-8 Date: Thu, 26 Mar 2026 11:34:56 +0100 Message-Id: Subject: Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection From: "Dominik Rusovac" To: "Daniel Kral" , Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 X-Mailer: aerc 0.20.0 References: <20260324183029.1274972-1-d.kral@proxmox.com> <20260324183029.1274972-10-d.kral@proxmox.com> In-Reply-To: <20260324183029.1274972-10-d.kral@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774521248525 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.164 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: VMNQP2VNRQKU3NY4IAD4R7FAMOU6HOJN X-Message-ID-Hash: VMNQP2VNRQKU3NY4IAD4R7FAMOU6HOJN X-MailFrom: d.rusovac@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: lgtm=20 pls find my comments inline, mostly relating to nits or tiny things On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote: > Assuming that a resource will hold the same dynamic resource usage on a > new node as on the previous node, score possible migrations, where: > > - the cluster node imbalance is minimal (bruteforce), or > - the shifted root mean square and maximum resource usages of the cpu > and memory is minimal across the cluster nodes (TOPSIS). > > Signed-off-by: Daniel Kral > --- > changes v1 -> v2: > - add saturating_sub() in remove_running_resource(...) (as suggested by > @Thomas) > - slightly move declarations and impls around so that reading from > top-to-bottom is a little easier > - pass NodeUsage vec instead of NodeStats vec to > calculate_node_imbalance(...) > - pass a closure to calculate_node_imbalance(...) (as suggested by > @Dominik) > - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the > struct is now ordered first by the imbalance and then the strings in > the `Migration` struct > - fix floating-point issue for the imbalance ordering for > ScoredMigration > - correctly implement `Ord` (essentially removing the reverse() and > moving these Reverse() wrappers to the usages for the BinaryHeap) > - use the `Migration` struct in `MigrationCandidate` as well > - drop Scheduler::node_stats() as it's unused now > - use Vec::with_capacity(...) where possible > - eagerly implement common traits (especially Clone and Debug) > - add test cases for the ScoredMigration ordering, node imbalance > calculation and the two rebalancing migration scoring methods > - s/score_best_balancing_migrations > /score_best_balancing_migration_candidates > to possibly allow the Scheduler/Usage impls handling the migration > candidate generation in the future instead of the callers [snip] > +/// Returns the load imbalance among the nodes. > +/// > +/// The load balance is measured as the statistical dispersion of the in= dividual node loads. > +/// > +/// The current implementation uses the dimensionless coefficient of var= iation, which expresses the > +/// standard deviation in relation to the average mean of the node loads= . > +/// > +/// The coefficient of variation is not robust, which is a desired prope= rty here, because outliers > +/// should be detected as much as possible. > +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeU= sage) -> f64) -> f64 { very nice docs!=20 [snip] > +/// A possible migration. > +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deseri= alize)] > +#[serde(rename_all =3D "kebab-case")] > +pub struct Migration { > + /// The identifier of a leading resource. > + pub sid: String, > + /// The current node of the leading resource. > + pub source_node: String, > + /// The possible migration target node for the resource. > + pub target_node: String, nit: on the long run, instead of having `ScoredMigration`,=20 it could be more convenient to have a field: pub imbalance: Option, > +} [snip] > +impl ScoredMigration { > + pub fn new>(migration: T, imbalance: f64) -> Self= { > + // Depending how the imbalance is calculated, it can contain min= or approximation errors. As // Depending [on] how [...] > + // this struct implements the Ord trait, users of the struct's c= mp() can run into cases, > + // where the imbalance is the same up to the significant digits = in base 10, but treated as > + // different values. > + // > + // Therefore, truncate any non-significant digits to prevent the= se cases. > + let factor =3D 10_f64.powf(f64::DIGITS as f64); > + let truncated_imbalance =3D f64::trunc(factor * imbalance) / fac= tor; Nice solution, this appears to be a clean approach to achieve deterministic= `Ord` for `f64`.=20 One small thing, tho: `f64::DIGITS` is technically not a floating number, b= ut `15_u32`. let factor =3D 10_f64.powi(f64::DIGITS as i32); thus, seems to be the better choice here. `powi` is also generally faster t= han `powf` [0].=20 [0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=3DU= sing%20this%20function%20is%20generally%20faster%20than%20using%20powf [snip] > +/// A possible migration candidate with the migrated usage stats. > +#[derive(Clone, Debug)] > +pub struct MigrationCandidate { > + /// The possible migration. > + pub migration: Migration, > + /// The to-be-migrated resource usage stats. imo, easier to comprehend:=20 /// Usage stats of the resource to be migrated > + pub stats: ResourceStats, > +} [snip]