From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 02F571FF13F for ; Thu, 26 Mar 2026 15:11:14 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D868814ADC; Thu, 26 Mar 2026 15:11:33 +0100 (CET) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 26 Mar 2026 15:11:28 +0100 Message-Id: Subject: Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection From: "Daniel Kral" To: "Dominik Rusovac" , X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260324183029.1274972-1-d.kral@proxmox.com> <20260324183029.1274972-10-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774534240055 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.094 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: MRCFPBESNFFGTWWYODUZL6TRPITGODWV X-Message-ID-Hash: MRCFPBESNFFGTWWYODUZL6TRPITGODWV X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Thu Mar 26, 2026 at 11:34 AM CET, Dominik Rusovac wrote: > lgtm=20 > > pls find my comments inline, mostly relating to nits or tiny things > > On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote: >> Assuming that a resource will hold the same dynamic resource usage on a >> new node as on the previous node, score possible migrations, where: >> >> - the cluster node imbalance is minimal (bruteforce), or >> - the shifted root mean square and maximum resource usages of the cpu >> and memory is minimal across the cluster nodes (TOPSIS). >> >> Signed-off-by: Daniel Kral >> --- >> changes v1 -> v2: >> - add saturating_sub() in remove_running_resource(...) (as suggested by >> @Thomas) >> - slightly move declarations and impls around so that reading from >> top-to-bottom is a little easier >> - pass NodeUsage vec instead of NodeStats vec to >> calculate_node_imbalance(...) >> - pass a closure to calculate_node_imbalance(...) (as suggested by >> @Dominik) >> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the >> struct is now ordered first by the imbalance and then the strings in >> the `Migration` struct >> - fix floating-point issue for the imbalance ordering for >> ScoredMigration >> - correctly implement `Ord` (essentially removing the reverse() and >> moving these Reverse() wrappers to the usages for the BinaryHeap) >> - use the `Migration` struct in `MigrationCandidate` as well >> - drop Scheduler::node_stats() as it's unused now >> - use Vec::with_capacity(...) where possible >> - eagerly implement common traits (especially Clone and Debug) >> - add test cases for the ScoredMigration ordering, node imbalance >> calculation and the two rebalancing migration scoring methods >> - s/score_best_balancing_migrations >> /score_best_balancing_migration_candidates >> to possibly allow the Scheduler/Usage impls handling the migration >> candidate generation in the future instead of the callers > > [snip] > >> +/// Returns the load imbalance among the nodes. >> +/// >> +/// The load balance is measured as the statistical dispersion of the i= ndividual node loads. >> +/// >> +/// The current implementation uses the dimensionless coefficient of va= riation, which expresses the >> +/// standard deviation in relation to the average mean of the node load= s. >> +/// >> +/// The coefficient of variation is not robust, which is a desired prop= erty here, because outliers >> +/// should be detected as much as possible. >> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&Node= Usage) -> f64) -> f64 { > > very nice docs!=20 > > [snip] > >> +/// A possible migration. >> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deser= ialize)] >> +#[serde(rename_all =3D "kebab-case")] >> +pub struct Migration { >> + /// The identifier of a leading resource. >> + pub sid: String, >> + /// The current node of the leading resource. >> + pub source_node: String, >> + /// The possible migration target node for the resource. >> + pub target_node: String, > > nit: on the long run, instead of having `ScoredMigration`,=20 > it could be more convenient to have a field: > > pub imbalance: Option, > Might make sense, but then we can't reuse the same structure in `MigrationCandidate` anymore, so I would let ScoredMigration be it's own type, what do you think? >> +} > > [snip] > >> +impl ScoredMigration { >> + pub fn new>(migration: T, imbalance: f64) -> Sel= f { >> + // Depending how the imbalance is calculated, it can contain mi= nor approximation errors. As > > // Depending [on] how [...] Thanks, will change that! > >> + // this struct implements the Ord trait, users of the struct's = cmp() can run into cases, >> + // where the imbalance is the same up to the significant digits= in base 10, but treated as >> + // different values. >> + // >> + // Therefore, truncate any non-significant digits to prevent th= ese cases. >> + let factor =3D 10_f64.powf(f64::DIGITS as f64); >> + let truncated_imbalance =3D f64::trunc(factor * imbalance) / fa= ctor; > > Nice solution, this appears to be a clean approach to achieve determinist= ic `Ord` > for `f64`.=20 > > One small thing, tho: `f64::DIGITS` is technically not a floating number,= but `15_u32`. > > let factor =3D 10_f64.powi(f64::DIGITS as i32); > > thus, seems to be the better choice here. `powi` is also generally faster= than `powf` [0].=20 > > [0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text= =3DUsing%20this%20function%20is%20generally%20faster%20than%20using%20powf Thanks, good catch! I also weren't aware of them being non-deterministic here, that's good to know. We briefly also talked about this off-list, I'll adapt the test cases below to not eq() the ScoredMigration directly as the truncation here might be non-deterministic (and I assumed otherwise). More-so, it's more important to verify the order in which the migrations are scored than their exact imbalance score as you suggested. > > [snip] > >> +/// A possible migration candidate with the migrated usage stats. >> +#[derive(Clone, Debug)] >> +pub struct MigrationCandidate { >> + /// The possible migration. >> + pub migration: Migration, >> + /// The to-be-migrated resource usage stats. > > imo, easier to comprehend:=20 > > /// Usage stats of the resource to be migrated Nice, will change it to that! > >> + pub stats: ResourceStats, >> +} > > [snip]