From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 01B201FF137 for ; Tue, 31 Mar 2026 15:32:09 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7AB071E0E5; Tue, 31 Mar 2026 15:32:36 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 31 Mar 2026 15:32:02 +0200 Message-Id: From: "Daniel Kral" To: =?utf-8?q?Michael_K=C3=B6ppl?= , Subject: Re: [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260330144101.668747-1-d.kral@proxmox.com> <20260330144101.668747-10-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774963867272 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.428 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: QG2MGYSHD3FPL6ELYNLWZ4USYICCMPFG X-Message-ID-Hash: QG2MGYSHD3FPL6ELYNLWZ4USYICCMPFG X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue Mar 31, 2026 at 2:42 PM CEST, Michael K=C3=B6ppl wrote: > On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote: > > [snip] > >> =20 >> + /// Adds the resource stats to the node stats as if the resource is= running on the node. >> + pub fn add_running_resource(&mut self, resource_stats: &ResourceSta= ts) { >> + self.cpu +=3D resource_stats.cpu; >> + self.mem +=3D resource_stats.mem; >> + } >> + >> + /// Removes the resource stats from the node stats as if the resour= ce is not running on the node. >> + pub fn remove_running_resource(&mut self, resource_stats: &Resource= Stats) { >> + self.cpu -=3D resource_stats.cpu; > > From what I can gather, due to how the stats are collected, it could > occur here that self.cpu < 0. I think it could make sense to do > something like > > self.cpu =3D f64::max(0.0, self.cpu - resource_stats.cpu); > > here to avoid it affecting the node imbalance calculation. > Right, thanks! Even though this shouldn't happen regularly, because for - the static case node.cpu is made up of all the already running resources 'cpu' properties summed, and - the dynamic case node.cpu >=3D (sum of resources 'cpu'), as it should at least include the reported resources 'cpu' and more because of other running resources on the host (e.g. pve backend, zfs, etc.), this would be a safer option to do regardless. I did a quick benchmark with a theoretical 10,000 resources 48 nodes cluster and it did not drastically change the runtime for the worse AFAICT. I'll wait on more feedback for a v4 or send it as a follow-up depending on how we apply these patches. >> + self.mem =3D self.mem.saturating_sub(resource_stats.mem); >> + } >> + >> /// Returns the current cpu usage as a percentage. >> pub fn cpu_load(&self) -> f64 { >> self.cpu / self.maxcpu as f64 >> @@ -38,6 +50,11 @@ impl NodeStats { >> pub fn mem_load(&self) -> f64 { >> self.mem as f64 / self.maxmem as f64 >> } >> + >> + /// Returns a combined node usage as a percentage. >> + pub fn load(&self) -> f64 { >> + (self.cpu_load() + self.mem_load()) / 2.0 >> + } >> } > > [snip]