public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Daniel Kral <d.kral@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [RFC PATCH-SERIES many 00/36] dynamic scheduler + load rebalancer
Date: Wed, 18 Mar 2026 17:54:53 +0100	[thread overview]
Message-ID: <3fcd4459-e5ff-48ca-8b70-53411a666247@proxmox.com> (raw)
In-Reply-To: <20260217141437.584852-1-d.kral@proxmox.com>

thx for this series!

On Tue, 17 Feb 2026, Daniel Kral wrote:
> proxmox:
>
> Daniel Kral (5):
>   resource-scheduling: move score_nodes_to_start_service to scheduler
>     crate
>   resource-scheduling: introduce generic cluster usage implementation
>   resource-scheduling: add dynamic node and service stats
>   resource-scheduling: implement rebalancing migration selection
>   resource-scheduling: implement Add and Default for
>     {Dynamic,Static}ServiceStats

A few more notes on the proxmox and perl-rs patches, besides what
Dominik already pointed out (thx!). Posting all those in a single
reply here as I already started out that way, but can to per-patch
replies if you prefer that. Will do the latter for the ha-manager
part.

The new scheduler logic seems to have no dedicated unit tests, i.e.
the crate only seems to test TOPSIS. Would be nice to have at least
basic tests for the imbalance calculation and migration scoring.

In proxmox patch 1/5 add_cpu_usage becomes pub here but goes back to
private in 2/5. Either move the function into scheduler.rs along with
its callers, or inline the sentinel logic into add_started_service
right away; the commit message also has no body, a short line on why
the move is needed wouldnÄT hurt.

In proxmox patch 2/5 deriving Debug for developer convenience for the
new public types (e.g. ServiceStats, NodeStats, NodeUsage, ClusterUsage)
wouldn't hurt.

For proxmox patch 4/5, remove_running_service subtracts usize fields
directly. If dynamic stats are stale or inconsistent, the mem subtraction
can panic in debug or wrap-around in release builds - probably better to
use a saturating_sub.
Also, load() gives CPU and memory equal weight, but PveTopsisAlternative
gives memory 5-10x more weight than CPU.
So the brute-force and TOPSIS paths use different ideas of "balance",
either fix that or document why it's fine.

ScoredMigration's Ord only compares imbalance, so two migrations with
the same imbalance but different source/target count as Equal, which
makes the BinaryHeap output order unpredictable. Maybe use the Migration
field, which is already Ord itself, to break any ties here as a secondary
key.

For proxmox patch 5/5, tiny style thing: Add for DynamicServiceStats has
Self as return type in its signature, while the Add impl for
StaticServiceUsage has Self::Output there, both return Self though; while
it doesn't really matter due to resolving to the same thing, it'd be still
nice to use one variant for consistency.

For the perl-rs side: pve_dynamic.rs and pve_static.rs are ~90%
identical. We already talked about this offlist and you mention this
as low-prio todo, but given that the Usage struct layout, the
generate_migration_candidates_from, all four score/select methods,
and every node/service management method are nearly the same, it
would be IMO still nice and worth it to have this deduplicated from
the start on.
E.g. generate_migration_candidates_from and the score/select
wrappers should be relatively easily shared, since they only differ
in the service stats type.

For perl-rs patch 4/6, CompactMigrationCandidate is introduced inside
pve_static, then moved to mod.rs in patch 6/6 when pve_dynamic needs
it. Same with the serde import. Better to create the module structure
and put the shared type there from the start, so we avoid the
back-and-forth.

In generate_migration_candidates_from (both copies),
leader.nodes.iter().next().unwrap() panics if the leader has an empty
nodes set. That probably cannot happen in practice, but IMO still worth
to avoid such unwraps in general and rather bail with an error instead.

Typo in the CompactMigrationCandidate doc comment: "MigationCandidate"
is missing an 'r', i.e. s/MigationCandidate/MigrationCandidate/

perl-rs patches 1/6, 2/6, and 5/6 have no commit message body. At least
for 1/6 and 5/6 short line on the motivation would be nice, since they
restructure the module layout.

That said, overall those two subseries are in good shape for an RFC!




      parent reply	other threads:[~2026-03-18 16:54 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17 14:13 Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 1/5] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 2/5] resource-scheduling: introduce generic cluster usage implementation Daniel Kral
2026-03-09 13:38   ` Dominik Rusovac
2026-03-10 10:41     ` Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 3/5] resource-scheduling: add dynamic node and service stats Daniel Kral
2026-02-17 14:13 ` [RFC proxmox 4/5] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-09 13:32   ` Dominik Rusovac
2026-03-10 10:40     ` Daniel Kral
2026-03-11  8:21       ` Dominik Rusovac
2026-02-17 14:13 ` [RFC proxmox 5/5] resource-scheduling: implement Add and Default for {Dynamic,Static}ServiceStats Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 1/6] pve-rs: resource scheduling: use generic cluster usage implementation Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 2/6] pve-rs: resource scheduling: create service_nodes hashset from array Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 3/6] pve-rs: resource scheduling: store service stats independently of node Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 4/6] pve-rs: resource scheduling: expose auto rebalancing methods Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 5/6] pve-rs: resource scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-02-17 14:14 ` [RFC perl-rs 6/6] pve-rs: resource scheduling: implement pve_dynamic bindings Daniel Kral
2026-02-17 14:14 ` [RFC cluster 1/2] datacenter config: add dynamic load scheduler option Daniel Kral
2026-02-18 11:06   ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC cluster 2/2] datacenter config: add auto rebalancing options Daniel Kral
2026-02-18 11:15   ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC ha-manager 01/21] rename static node stats to be consistent with similar interfaces Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 02/21] resources: remove redundant load_config fallback for static config Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 03/21] remove redundant service_node and migration_target parameter Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 04/21] factor out common pve to ha resource type mapping Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 05/21] derive static service stats while filling the service stats repository Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 06/21] test: make static service usage explicit for all resources Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 07/21] make static service stats indexable by sid Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 08/21] move static service stats repository to PVE::HA::Usage::Static Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 09/21] usage: augment service stats with node and state information Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 10/21] include running non-HA resources in the scheduler's accounting Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 11/21] env, resources: add dynamic node and service stats abstraction Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 12/21] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 13/21] sim: hardware: pass correct types for static stats Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 14/21] sim: hardware: factor out static stats' default values Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 15/21] sim: hardware: rewrite set-static-stats Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 16/21] sim: hardware: add set-dynamic-stats for services Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-03-19  7:38     ` Dominik Rusovac
2026-03-18 22:34   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 17/21] usage: add dynamic usage scheduler Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 18/21] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 19/21] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-02-17 14:14 ` [RFC ha-manager 20/21] implement automatic rebalancing Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC ha-manager 21/21] test: add basic automatic rebalancing system test cases Daniel Kral
2026-03-18 16:54   ` Thomas Lamprecht
2026-02-17 14:14 ` [RFC manager 1/2] ui: dc/options: add dynamic load scheduler option Daniel Kral
2026-02-18 11:10   ` Maximiliano Sandoval
2026-02-17 14:14 ` [RFC manager 2/2] ui: dc/options: add auto rebalancing options Daniel Kral
2026-03-12 16:24 ` [RFC PATCH-SERIES many 00/36] dynamic scheduler + load rebalancer DERUMIER, Alexandre
2026-03-13  9:35   ` Daniel Kral
2026-03-18 16:54 ` Thomas Lamprecht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fcd4459-e5ff-48ca-8b70-53411a666247@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal