From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 865E31FF138 for ; Wed, 18 Mar 2026 17:54:47 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B9E367261; Wed, 18 Mar 2026 17:54:59 +0100 (CET) Message-ID: <3fcd4459-e5ff-48ca-8b70-53411a666247@proxmox.com> Date: Wed, 18 Mar 2026 17:54:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [RFC PATCH-SERIES many 00/36] dynamic scheduler + load rebalancer To: Daniel Kral , pve-devel@lists.proxmox.com References: <20260217141437.584852-1-d.kral@proxmox.com> Content-Language: en-US From: Thomas Lamprecht In-Reply-To: <20260217141437.584852-1-d.kral@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1773852851908 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.076 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.408 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.819 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.903 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: BAEBZSSNHIHG2NA47ICV6V57KMCFPIGS X-Message-ID-Hash: BAEBZSSNHIHG2NA47ICV6V57KMCFPIGS X-MailFrom: t.lamprecht@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: thx for this series! On Tue, 17 Feb 2026, Daniel Kral wrote: > proxmox: > > Daniel Kral (5): > resource-scheduling: move score_nodes_to_start_service to scheduler > crate > resource-scheduling: introduce generic cluster usage implementation > resource-scheduling: add dynamic node and service stats > resource-scheduling: implement rebalancing migration selection > resource-scheduling: implement Add and Default for > {Dynamic,Static}ServiceStats A few more notes on the proxmox and perl-rs patches, besides what Dominik already pointed out (thx!). Posting all those in a single reply here as I already started out that way, but can to per-patch replies if you prefer that. Will do the latter for the ha-manager part. The new scheduler logic seems to have no dedicated unit tests, i.e. the crate only seems to test TOPSIS. Would be nice to have at least basic tests for the imbalance calculation and migration scoring. In proxmox patch 1/5 add_cpu_usage becomes pub here but goes back to private in 2/5. Either move the function into scheduler.rs along with its callers, or inline the sentinel logic into add_started_service right away; the commit message also has no body, a short line on why the move is needed wouldnÄT hurt. In proxmox patch 2/5 deriving Debug for developer convenience for the new public types (e.g. ServiceStats, NodeStats, NodeUsage, ClusterUsage) wouldn't hurt. For proxmox patch 4/5, remove_running_service subtracts usize fields directly. If dynamic stats are stale or inconsistent, the mem subtraction can panic in debug or wrap-around in release builds - probably better to use a saturating_sub. Also, load() gives CPU and memory equal weight, but PveTopsisAlternative gives memory 5-10x more weight than CPU. So the brute-force and TOPSIS paths use different ideas of "balance", either fix that or document why it's fine. ScoredMigration's Ord only compares imbalance, so two migrations with the same imbalance but different source/target count as Equal, which makes the BinaryHeap output order unpredictable. Maybe use the Migration field, which is already Ord itself, to break any ties here as a secondary key. For proxmox patch 5/5, tiny style thing: Add for DynamicServiceStats has Self as return type in its signature, while the Add impl for StaticServiceUsage has Self::Output there, both return Self though; while it doesn't really matter due to resolving to the same thing, it'd be still nice to use one variant for consistency. For the perl-rs side: pve_dynamic.rs and pve_static.rs are ~90% identical. We already talked about this offlist and you mention this as low-prio todo, but given that the Usage struct layout, the generate_migration_candidates_from, all four score/select methods, and every node/service management method are nearly the same, it would be IMO still nice and worth it to have this deduplicated from the start on. E.g. generate_migration_candidates_from and the score/select wrappers should be relatively easily shared, since they only differ in the service stats type. For perl-rs patch 4/6, CompactMigrationCandidate is introduced inside pve_static, then moved to mod.rs in patch 6/6 when pve_dynamic needs it. Same with the serde import. Better to create the module structure and put the shared type there from the start, so we avoid the back-and-forth. In generate_migration_candidates_from (both copies), leader.nodes.iter().next().unwrap() panics if the leader has an empty nodes set. That probably cannot happen in practice, but IMO still worth to avoid such unwraps in general and rather bail with an error instead. Typo in the CompactMigrationCandidate doc comment: "MigationCandidate" is missing an 'r', i.e. s/MigationCandidate/MigrationCandidate/ perl-rs patches 1/6, 2/6, and 5/6 have no commit message body. At least for 1/6 and 5/6 short line on the motivation would be nice, since they restructure the module layout. That said, overall those two subseries are in good shape for an RFC!