[PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer
@ 2026-03-24 18:29 Daniel Kral
  2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
                   ` (39 more replies)
  0 siblings, 40 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

This RFC series proposes an implementation for a dynamic scheduler and
manual/automatic static/dynamic load rebalancer by implementing the
following:

- gather dynamic node and service usage information and use it in the
  dynamic scheduler, and

- implement a load rebalancer, which actively moves HA resources to
  other nodes, to lower the overall cluster node imbalance, while
  adhering to the HA rules.



== Changelog v1 -> v2 ==

The following changelog is mostly an overview of the more important
changes and fixes. I've added per-patch changelogs, but in general the
patch messages and documentation was improved for most of them.

I haven't fully updated the cover letter description, but regarding the
benchmarks I quickly ran the same benchmarks I wrote for the initial RFC
series and the numbers didn't change quite that much. I intend to
contribute these benchmarks to the main repository so we can have a
better eye on that, but they need some more care to be truly a benefit
to the repositories.

All patches were individually built with the packages derivations of
`git rebase master --exec 'make clean && make deb'` and special care was
taken for the deprecation of proxmox-resource-scheduling items used in
the pve-rs bindings.

Otherwise it would be great to have at least a few more fresh eyes
testing the functionality from this series. I'll do the same over the
next few days.

I've dropped the pve-manager patches as having the core functionality
out is more important now, but the pve-manager from v1 should be fully
functional for this. I'll send a follow-up to this (or include it in a
v3) accompanying pve-docs patches as well.

Thank you very much for the feedback, @Dominik, @Thomas, and
@Maximiliano!

(important) fixes v1 -> v2:
- make the dynamic load balancer assume starting resources are using
  their `max*` stats, so that rebalance on start works as expected and
  doesn't choose the least loaded node on startup
- do not panic of `topsis::score_alternatives(...)` is called with no
  alternatives
- implement the `Ord` trait for `ScoredMigration` correctly and also
  sort by `Migration` if two imbalance scores are (nearly) equal
- fix floating-point approximation errors for imbalance scores for
  `ScoredMigration`
- fix bug where resource bundles could have been created, where
  dependent resources were still in migrate or relocate state (but
  wouldn't really happen as for now it is not called at all if there is
  any migration in the HA Manager state)

(important) changes v1 -> v2:
- rebase all repos on the current master
- drop pve-manager patches for now, will follow up on this (+ with
  pve-docs patches regarding this feature)
- move the non-HA resource accounting out into its own series [0]
  (which is already applied!)
- rename `proxmox_resource_scheduling::scheduler::ClusterUsage` to
  `proxmox_resource_scheduling::scheduler::Scheduler`
- move the `Usage` implementation to the proxmox-resource-scheduling:
  reduce code duplication between the static and dynamic usage stats
  and adding the necessary state tracking to allow discriminating
  starting and started resources from one another
- drop `calculate_node_loads(...)` as it's currently unused
- drop `select_best_balancing_migration()` variants as these can be
  easily implemented with `score_best_balancing_migrations(..., 1)` in
  pve-ha-manager itself
- rename `score_best_balancing_migrations()` to
  `score_best_balancing_migration_candidates()` to make it possible to
  implement the former if we ever move the migration candidate
  generation to `Usage` or `Scheduler`

[0] https://lore.proxmox.com/pve-devel/DH6X1GK1YI3W.24PDWP36RSCVU@proxmox.com/




== Model ==

The automatic load rebalancing system checks whether the cluster node
imbalance exceeds some user-defined threshold for some HA Manager rounds
("hold duration"). If it does exceed on consecutive HA Manager rounds,
it will choose the best service migration/relocation to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined improvement ("margin").

The best service motion can be selected by either bruteforce or TOPSIS.
This selection method and some other parameters from above can be
tweaked at runtime.



== Tests ==

I've added some test cases to ensure more basic decisions are
documented. The other tests were in virtualized clusters with adding
load dynamically to guests with stress-ng, even though I plan to rely
more on real-world load simulators for the next batch of tests.



== Benchmarks ==

I've also done some theoretical benchmarks with the target of being able
to handle a 48 nodes cluster and 9.999 HA resources / guests and a
worst-case scenario of each HA resource being part of 3 HA rules
(pairwise positive and negative resource affinity rules, where each
positive resource affinity pair has a common node affinity rule).

Generating the migration candidates for the huge cluster with the
worst-case HA ruleset takes 243 +- 9 ms.

Generating the migration candidates for the huge cluster without the
worst-case HA ruleset (to gain the most amount of 459954 migration
candidates) takes 356 +- 6 ms. This is expected, because we need to
evaluate more HA resources' rules as there are no HA resource bundles.

Excluding the generation, the brute force and TOPSIS method for
select_best_balancing_migration() were roughly similar both being in the
range 350 +- 50 ms for the huge cluster without any HA rules (for the
maximum amount of migration candidates) including the serialization
between Perl and Rust.



== Future ideas ==

- include the migration costs in score_best_balancing_migrations(),
  e.g., so that VMs with lots of memory are less likely to be migrated
  if the link between the nodes is slow, but that would need measuring
  and storing the migration network link speeds as a mesh

- apply some filter like moving average window or exponential smoothing
  on the usage time series to dampen spikes; triple exponential
  smoothing (Holts-Winters) is also already implemented in rrdcached and
  allows for exponential smoothing with better time series analysis but
  would require changing the rrdcached data structure once more

- score_best_balancing_migrations(...) can already provide a
  size-limited list of the best migrations, which could be exposed to
  users to allow manual load balancing actions, e.g., from the web
  interface, to get some insight in the system

- The current scheduler can only solve bin covering, but it would be
  interesting to also allow bin packing if certain criteria are met,
  e.g., for energy preservation while the overall cluster load is low

- Allow individual HA resources to be actively excluded from the
  automatic rebalancing, e.g., because containers cannot be live
  migrated.

- move the migration candidate generation to the rust-side; the
  generation on the perl-side was chosen first to reduce code
  duplication, but it doesn't seem future proof and right to copy state
  to the online_node_usage object twice (medium priority)



== Diffstat ==


proxmox:

Daniel Kral (9):
  resource-scheduling: inline add_cpu_usage in
    score_nodes_to_start_service
  resource-scheduling: move score_nodes_to_start_service to scheduler
    crate
  resource-scheduling: rename service to resource where appropriate
  resource-scheduling: introduce generic scheduler implementation
  resource-scheduling: implement generic cluster usage implementation
  resource-scheduling: topsis: handle empty criteria without panics
  resource-scheduling: compare by nodename in
    score_nodes_to_start_resource
  resource-scheduling: factor out topsis alternative mapping
  resource-scheduling: implement rebalancing migration selection

 proxmox-resource-scheduling/src/lib.rs        |   9 +
 proxmox-resource-scheduling/src/node.rs       |  96 +++++
 proxmox-resource-scheduling/src/pve_static.rs | 102 ++---
 proxmox-resource-scheduling/src/resource.rs   | 152 +++++++
 proxmox-resource-scheduling/src/scheduler.rs  | 406 ++++++++++++++++++
 proxmox-resource-scheduling/src/topsis.rs     |   6 +-
 proxmox-resource-scheduling/src/usage.rs      | 183 ++++++++
 .../tests/scheduler.rs                        | 367 ++++++++++++++++
 proxmox-resource-scheduling/tests/usage.rs    | 153 +++++++
 9 files changed, 1406 insertions(+), 68 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/node.rs
 create mode 100644 proxmox-resource-scheduling/src/resource.rs
 create mode 100644 proxmox-resource-scheduling/src/scheduler.rs
 create mode 100644 proxmox-resource-scheduling/src/usage.rs
 create mode 100644 proxmox-resource-scheduling/tests/scheduler.rs
 create mode 100644 proxmox-resource-scheduling/tests/usage.rs


perl-rs:

Daniel Kral (7):
  pve-rs: resource-scheduling: remove pedantic error handling from
    remove_node
  pve-rs: resource-scheduling: remove pedantic error handling from
    remove_service_usage
  pve-rs: resource-scheduling: move pve_static into resource_scheduling
    module
  pve-rs: resource-scheduling: use generic usage implementation
  pve-rs: resource-scheduling: static: replace deprecated usage structs
  pve-rs: resource-scheduling: implement pve_dynamic bindings
  pve-rs: resource-scheduling: expose auto rebalancing methods

 pve-rs/Makefile                               |   1 +
 pve-rs/src/bindings/mod.rs                    |   3 +-
 .../src/bindings/resource_scheduling/mod.rs   |  10 +
 .../resource_scheduling/pve_dynamic.rs        | 227 ++++++++++++++++++
 .../resource_scheduling/pve_static.rs         | 225 +++++++++++++++++
 .../bindings/resource_scheduling/resource.rs  | 128 ++++++++++
 .../src/bindings/resource_scheduling/usage.rs |  81 +++++++
 .../bindings/resource_scheduling_static.rs    | 215 -----------------
 pve-rs/test/resource_scheduling.pl            |   1 +
 9 files changed, 674 insertions(+), 217 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/mod.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_static.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/resource.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/usage.rs
 delete mode 100644 pve-rs/src/bindings/resource_scheduling_static.rs


cluster:

Daniel Kral (3):
  datacenter config: restructure verbose description for the ha crs
    option
  datacenter config: add dynamic load scheduler option
  datacenter config: add auto rebalancing options

 src/PVE/DataCenterConfig.pm | 51 ++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 4 deletions(-)


ha-manager:

Daniel Kral (15):
  env: pve2: implement dynamic node and service stats
  usage: pass service data to add_service_usage
  usage: pass service data to get_used_service_nodes
  add running flag to cluster service stats
  usage: use add_service to add service usage to nodes
  usage: add dynamic usage scheduler
  test: add dynamic usage scheduler test cases
  manager: rename execute_migration to queue_resource_motion
  manager: update_crs_scheduler_mode: factor out crs config
  implement automatic rebalancing
  test: add resource bundle generation test cases
  test: add dynamic automatic rebalancing system test cases
  test: add static automatic rebalancing system test cases
  test: add automatic rebalancing system test cases with TOPSIS method
  test: add automatic rebalancing system test cases with affinity rules

Dominik Rusovac (6):
  sim: hardware: pass correct types for static stats
  sim: hardware: factor out static stats' default values
  sim: hardware: fix static stats guard
  sim: hardware: handle dynamic service stats
  sim: hardware: add set-dynamic-stats command
  sim: hardware: add getters for dynamic {node,service} stats

 debian/pve-ha-manager.install                 |   1 +
 src/PVE/HA/Env.pm                             |  12 +
 src/PVE/HA/Env/PVE2.pm                        |  64 +++++
 src/PVE/HA/Manager.pm                         | 218 +++++++++++++++-
 src/PVE/HA/Rules/ResourceAffinity.pm          |   3 +-
 src/PVE/HA/Sim/Env.pm                         |  12 +
 src/PVE/HA/Sim/Hardware.pm                    | 183 ++++++++++++--
 src/PVE/HA/Sim/RTHardware.pm                  |   4 +-
 src/PVE/HA/Usage.pm                           |  61 ++++-
 src/PVE/HA/Usage/Basic.pm                     |   9 +-
 src/PVE/HA/Usage/Dynamic.pm                   | 143 +++++++++++
 src/PVE/HA/Usage/Makefile                     |   2 +-
 src/PVE/HA/Usage/Static.pm                    |  53 +++-
 src/test/Makefile                             |   1 +
 .../README                                    |   2 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../README                                    |   7 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../README                                    |   4 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |   4 +
 .../cmdlist                                   |  16 ++
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  80 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |  11 +
 .../cmdlist                                   |  13 +
 .../datacenter.cfg                            |   9 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance0/README   |   2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../test-crs-dynamic-auto-rebalance1/README   |   7 +
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-dynamic-auto-rebalance2/README   |   4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-dynamic-auto-rebalance3/README   |   4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  |  16 ++
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  80 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance4/README   |  11 +
 .../test-crs-dynamic-auto-rebalance4/cmdlist  |  13 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |   7 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   5 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  49 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |   5 +
 .../static_service_stats                      |   5 +
 .../README                                    |  12 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  53 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   3 +
 .../service_config                            |   4 +
 .../static_service_stats                      |   4 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |  31 +++
 .../rules_config                              |   3 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../rules_config                              |   7 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 src/test/test-crs-dynamic-rebalance1/README   |   3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |   4 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   7 +
 .../hardware_status                           |   5 +
 .../test-crs-dynamic-rebalance1/log.expect    |  88 +++++++
 .../manager_status                            |   1 +
 .../service_config                            |   7 +
 .../static_service_stats                      |   7 +
 src/test/test-crs-dynamic1/README             |   4 +
 src/test/test-crs-dynamic1/cmdlist            |   4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |   6 +
 .../test-crs-dynamic1/dynamic_service_stats   |   3 +
 src/test/test-crs-dynamic1/hardware_status    |   5 +
 src/test/test-crs-dynamic1/log.expect         |  51 ++++
 src/test/test-crs-dynamic1/manager_status     |   1 +
 src/test/test-crs-dynamic1/service_config     |   3 +
 .../test-crs-dynamic1/static_service_stats    |   3 +
 .../test-crs-static-auto-rebalance1/README    |   7 +
 .../test-crs-static-auto-rebalance1/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-static-auto-rebalance2/README    |   4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-static-auto-rebalance3/README    |   3 +
 .../test-crs-static-auto-rebalance3/cmdlist   |  15 ++
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  79 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 src/test/test_resource_bundles.pl             | 234 ++++++++++++++++++
 187 files changed, 2791 insertions(+), 49 deletions(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats
 create mode 100755 src/test/test_resource_bundles.pl


Summary over all repositories:
  206 files changed, 4918 insertions(+), 338 deletions(-)

-- 
Generated by murpp 0.11.0




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:10   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
                   ` (38 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

This makes moving the function out into its own module easier to follow,
which in turn is needed to generalize score_nodes_to_start_service(...)
for other usage stats in the following patches.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 proxmox-resource-scheduling/src/pve_static.rs | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index b81086dd..fd5e5ffc 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -94,7 +94,11 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
             for (index, node) in nodes.iter().enumerate() {
                 let node = node.as_ref();
                 let new_cpu = if index == target_index {
-                    add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
+                    if service.maxcpu == 0.0 {
+                        node.cpu + node.maxcpu as f64
+                    } else {
+                        node.cpu + service.maxcpu
+                    }
                 } else {
                     node.cpu
                 } / (node.maxcpu as f64);
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
  2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-26 10:10   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:10 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This makes moving the function out into its own module easier to follow,
> which in turn is needed to generalize score_nodes_to_start_service(...)
> for other usage stats in the following patches.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
>  proxmox-resource-scheduling/src/pve_static.rs | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
> index b81086dd..fd5e5ffc 100644
> --- a/proxmox-resource-scheduling/src/pve_static.rs
> +++ b/proxmox-resource-scheduling/src/pve_static.rs
> @@ -94,7 +94,11 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
>              for (index, node) in nodes.iter().enumerate() {
>                  let node = node.as_ref();
>                  let new_cpu = if index == target_index {
> -                    add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
> +                    if service.maxcpu == 0.0 {
> +                        node.cpu + node.maxcpu as f64
> +                    } else {
> +                        node.cpu + service.maxcpu
> +                    }
>                  } else {
>                      node.cpu
>                  } / (node.maxcpu as f64);





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
  2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:11   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
                   ` (37 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

This is done so score_nodes_to_start_service(...) can be generalized in
the following patches, so other usage stat structs can reuse the same
scoring method.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add patch message
- do not change visibility of pve_static::add_cpu_usage() (therefore the
  inlining of the code in the patch before this)

 proxmox-resource-scheduling/src/lib.rs        |  2 +
 proxmox-resource-scheduling/src/pve_static.rs | 76 +---------------
 proxmox-resource-scheduling/src/scheduler.rs  | 90 +++++++++++++++++++
 3 files changed, 94 insertions(+), 74 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/scheduler.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 47980259..c73e7b1e 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,4 +1,6 @@
 #[macro_use]
 pub mod topsis;
 
+pub mod scheduler;
+
 pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index fd5e5ffc..5df0be37 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,7 +1,7 @@
 use anyhow::Error;
 use serde::{Deserialize, Serialize};
 
-use crate::topsis;
+use crate::scheduler;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -53,23 +53,6 @@ pub struct StaticServiceUsage {
     pub maxmem: usize,
 }
 
-criteria_struct! {
-    /// A given alternative.
-    struct PveTopsisAlternative {
-        #[criterion("average CPU", -1.0)]
-        average_cpu: f64,
-        #[criterion("highest CPU", -2.0)]
-        highest_cpu: f64,
-        #[criterion("average memory", -5.0)]
-        average_memory: f64,
-        #[criterion("highest memory", -10.0)]
-        highest_memory: f64,
-    }
-
-    const N_CRITERIA;
-    static PVE_HA_TOPSIS_CRITERIA;
-}
-
 /// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
 /// and CPU usages of the nodes as if the service would already be running on each.
 ///
@@ -79,60 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    let len = nodes.len();
-
-    let matrix = nodes
-        .iter()
-        .enumerate()
-        .map(|(target_index, _)| {
-            // Base values on percentages to allow comparing nodes with different stats.
-            let mut highest_cpu = 0.0;
-            let mut squares_cpu = 0.0;
-            let mut highest_mem = 0.0;
-            let mut squares_mem = 0.0;
-
-            for (index, node) in nodes.iter().enumerate() {
-                let node = node.as_ref();
-                let new_cpu = if index == target_index {
-                    if service.maxcpu == 0.0 {
-                        node.cpu + node.maxcpu as f64
-                    } else {
-                        node.cpu + service.maxcpu
-                    }
-                } else {
-                    node.cpu
-                } / (node.maxcpu as f64);
-                highest_cpu = f64::max(highest_cpu, new_cpu);
-                squares_cpu += new_cpu.powi(2);
-
-                let new_mem = if index == target_index {
-                    node.mem + service.maxmem
-                } else {
-                    node.mem
-                } as f64
-                    / node.maxmem as f64;
-                highest_mem = f64::max(highest_mem, new_mem);
-                squares_mem += new_mem.powi(2);
-            }
-
-            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-            // 1.004 is only slightly more than 1.002.
-            PveTopsisAlternative {
-                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                highest_cpu: 1.0 + highest_cpu,
-                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                highest_memory: 1.0 + highest_mem,
-            }
-            .into()
-        })
-        .collect::<Vec<_>>();
-
-    let scores =
-        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
-    Ok(scores
-        .into_iter()
-        .enumerate()
-        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
-        .collect())
+    scheduler::score_nodes_to_start_service(nodes, service)
 }
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
new file mode 100644
index 00000000..385015e3
--- /dev/null
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -0,0 +1,90 @@
+use anyhow::Error;
+
+use crate::{
+    pve_static::{StaticNodeUsage, StaticServiceUsage},
+    topsis,
+};
+
+criteria_struct! {
+    /// A given alternative.
+    struct PveTopsisAlternative {
+        #[criterion("average CPU", -1.0)]
+        average_cpu: f64,
+        #[criterion("highest CPU", -2.0)]
+        highest_cpu: f64,
+        #[criterion("average memory", -5.0)]
+        average_memory: f64,
+        #[criterion("highest memory", -10.0)]
+        highest_memory: f64,
+    }
+
+    const N_CRITERIA;
+    static PVE_HA_TOPSIS_CRITERIA;
+}
+
+/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the service would already be running on each.
+///
+/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
+/// is better.
+pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+    nodes: &[T],
+    service: &StaticServiceUsage,
+) -> Result<Vec<(String, f64)>, Error> {
+    let len = nodes.len();
+
+    let matrix = nodes
+        .iter()
+        .enumerate()
+        .map(|(target_index, _)| {
+            // Base values on percentages to allow comparing nodes with different stats.
+            let mut highest_cpu = 0.0;
+            let mut squares_cpu = 0.0;
+            let mut highest_mem = 0.0;
+            let mut squares_mem = 0.0;
+
+            for (index, node) in nodes.iter().enumerate() {
+                let node = node.as_ref();
+                let new_cpu = if index == target_index {
+                    if service.maxcpu == 0.0 {
+                        node.cpu + node.maxcpu as f64
+                    } else {
+                        node.cpu + service.maxcpu
+                    }
+                } else {
+                    node.cpu
+                } / (node.maxcpu as f64);
+                highest_cpu = f64::max(highest_cpu, new_cpu);
+                squares_cpu += new_cpu.powi(2);
+
+                let new_mem = if index == target_index {
+                    node.mem + service.maxmem
+                } else {
+                    node.mem
+                } as f64
+                    / node.maxmem as f64;
+                highest_mem = f64::max(highest_mem, new_mem);
+                squares_mem += new_mem.powi(2);
+            }
+
+            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+            // 1.004 is only slightly more than 1.002.
+            PveTopsisAlternative {
+                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+                highest_cpu: 1.0 + highest_cpu,
+                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+                highest_memory: 1.0 + highest_mem,
+            }
+            .into()
+        })
+        .collect::<Vec<_>>();
+
+    let scores =
+        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+    Ok(scores
+        .into_iter()
+        .enumerate()
+        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
+        .collect())
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate
  2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
@ 2026-03-26 10:11   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:11 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is done so score_nodes_to_start_service(...) can be generalized in
> the following patches, so other usage stat structs can reuse the same
> scoring method.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - add patch message
> - do not change visibility of pve_static::add_cpu_usage() (therefore the
>   inlining of the code in the patch before this)

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
  2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
  2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:12   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
                   ` (36 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The term `resource` is more appropriate with respect to the crate name
and also the preferred name for the current main application in the HA
Manager.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 proxmox-resource-scheduling/src/pve_static.rs |  2 +-
 proxmox-resource-scheduling/src/scheduler.rs  | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index 5df0be37..c7e1d1b1 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -62,5 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    scheduler::score_nodes_to_start_service(nodes, service)
+    scheduler::score_nodes_to_start_resource(nodes, service)
 }
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 385015e3..39ee44ce 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -22,14 +22,14 @@ criteria_struct! {
     static PVE_HA_TOPSIS_CRITERIA;
 }
 
-/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the service would already be running on each.
+/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the resource would already be running on each.
 ///
 /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
 /// is better.
-pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
-    service: &StaticServiceUsage,
+    resource: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
     let len = nodes.len();
 
@@ -46,10 +46,10 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
             for (index, node) in nodes.iter().enumerate() {
                 let node = node.as_ref();
                 let new_cpu = if index == target_index {
-                    if service.maxcpu == 0.0 {
+                    if resource.maxcpu == 0.0 {
                         node.cpu + node.maxcpu as f64
                     } else {
-                        node.cpu + service.maxcpu
+                        node.cpu + resource.maxcpu
                     }
                 } else {
                     node.cpu
@@ -58,7 +58,7 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
                 squares_cpu += new_cpu.powi(2);
 
                 let new_mem = if index == target_index {
-                    node.mem + service.maxmem
+                    node.mem + resource.maxmem
                 } else {
                     node.mem
                 } as f64
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate
  2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
@ 2026-03-26 10:12   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:12 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The term `resource` is more appropriate with respect to the crate name
> and also the preferred name for the current main application in the HA
> Manager.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (2 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:19   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
                   ` (35 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The existing score_nodes_to_start_resource(...) function is dependent on
the StaticNodeUsage and StaticServiceUsage structs.

To use this function for other usage stats structs as well, declare
generic NodeStats and ResourceStats structs, that the users can convert
into. These are used to make score_nodes_to_start_resource(...) and its
documentation generic.

The pve_static::score_nodes_to_start_service(...) is marked as
deprecated accordingly. The usage-related structs are marked as
deprecated as well as the specific usage implementations - including
their serialization and deserialization - should be handled by the
caller now.

This is best viewed with the git option --ignore-all-space.

No functional changes intended.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch was "[RFC proxmox 2/5] resource-scheduling: introduce generic
cluster usage implementation" in the RFC v1. Sorry for the ambigious
naming with the next patch!

changes v1 -> v2:
- add more information to the patch message
- split out `NodeStats` and `ResourceStats` to their own modules, which
  will also be used in an upcoming patch for the `Usage` implementation
- add deprecation note to pve_static::score_nodes_to_start_service()
- add deprecation attribute to other major pve_static items as well
- impl `Add` and `Sum` for ResourceStats, which will be used for the
  resource bundling in pve-rs later
- eagerly implement common traits (especially Clone and Debug)
- add test cases for the
  scheduler::Scheduler::score_nodes_to_start_resource()

 proxmox-resource-scheduling/src/lib.rs        |   6 +
 proxmox-resource-scheduling/src/node.rs       |  39 ++++
 proxmox-resource-scheduling/src/pve_static.rs |  46 +++-
 proxmox-resource-scheduling/src/resource.rs   |  33 +++
 proxmox-resource-scheduling/src/scheduler.rs  | 157 ++++++++------
 .../tests/scheduler.rs                        | 200 ++++++++++++++++++
 6 files changed, 408 insertions(+), 73 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/node.rs
 create mode 100644 proxmox-resource-scheduling/src/resource.rs
 create mode 100644 proxmox-resource-scheduling/tests/scheduler.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index c73e7b1e..12b743fe 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,6 +1,12 @@
 #[macro_use]
 pub mod topsis;
 
+pub mod node;
+pub mod resource;
+
 pub mod scheduler;
 
+// pve_static exists only for backwards compatibility to not break builds
+// The allow(deprecated) is to not report its own use of deprecated items
+#[allow(deprecated)]
 pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
new file mode 100644
index 00000000..e6227eda
--- /dev/null
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -0,0 +1,39 @@
+use crate::resource::ResourceStats;
+
+/// Usage statistics of a node.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct NodeStats {
+    /// CPU utilization in CPU cores.
+    pub cpu: f64,
+    /// Total number of CPU cores.
+    pub maxcpu: usize,
+    /// Used memory in bytes.
+    pub mem: usize,
+    /// Total memory in bytes.
+    pub maxmem: usize,
+}
+
+impl NodeStats {
+    /// Adds the resource stats to the node stats as if the resource has started on the node.
+    pub fn add_started_resource(&mut self, resource_stats: &ResourceStats) {
+        // a maxcpu value of `0.0` means no cpu usage limit on the node
+        let resource_cpu = if resource_stats.maxcpu == 0.0 {
+            self.maxcpu as f64
+        } else {
+            resource_stats.maxcpu
+        };
+
+        self.cpu += resource_cpu;
+        self.mem += resource_stats.maxmem;
+    }
+
+    /// Returns the current cpu usage as a percentage.
+    pub fn cpu_load(&self) -> f64 {
+        self.cpu / self.maxcpu as f64
+    }
+
+    /// Returns the current memory usage as a percentage.
+    pub fn mem_load(&self) -> f64 {
+        self.mem as f64 / self.maxmem as f64
+    }
+}
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index c7e1d1b1..229ee3c6 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,10 +1,12 @@
 use anyhow::Error;
 use serde::{Deserialize, Serialize};
 
-use crate::scheduler;
+use crate::scheduler::{NodeUsage, Scheduler};
+use crate::{node::NodeStats, resource::ResourceStats};
 
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
+#[deprecated = "specific node usage structs should be declared where they are used"]
 /// Static usage information of a node.
 pub struct StaticNodeUsage {
     /// Hostname of the node.
@@ -33,6 +35,22 @@ impl AsRef<StaticNodeUsage> for StaticNodeUsage {
     }
 }
 
+impl From<StaticNodeUsage> for NodeUsage {
+    fn from(usage: StaticNodeUsage) -> Self {
+        let stats = NodeStats {
+            cpu: usage.cpu,
+            maxcpu: usage.maxcpu,
+            mem: usage.mem,
+            maxmem: usage.maxmem,
+        };
+
+        Self {
+            name: usage.name,
+            stats,
+        }
+    }
+}
+
 /// Calculate new CPU usage in percent.
 /// `add` being `0.0` means "unlimited" and results in `max` being added.
 fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
@@ -43,8 +61,9 @@ fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
     }
 }
 
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Copy, Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
+#[deprecated = "specific service usage structs should be declared where they are used"]
 /// Static usage information of an HA resource.
 pub struct StaticServiceUsage {
     /// Number of assigned CPUs or CPU limit.
@@ -53,14 +72,33 @@ pub struct StaticServiceUsage {
     pub maxmem: usize,
 }
 
+impl From<StaticServiceUsage> for ResourceStats {
+    fn from(usage: StaticServiceUsage) -> Self {
+        Self {
+            cpu: usage.maxcpu,
+            maxcpu: usage.maxcpu,
+            mem: usage.maxmem,
+            maxmem: usage.maxmem,
+        }
+    }
+}
+
 /// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
 /// and CPU usages of the nodes as if the service would already be running on each.
 ///
 /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
 /// is better.
+#[deprecated = "use Scheduler::score_nodes_to_start_resource(...) directly instead"]
 pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    scheduler::score_nodes_to_start_resource(nodes, service)
+    let nodes = nodes
+        .iter()
+        .map(|node| node.as_ref().clone().into())
+        .collect::<Vec<NodeUsage>>();
+
+    let scheduler = Scheduler::from_nodes(nodes);
+
+    scheduler.score_nodes_to_start_resource(*service)
 }
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
new file mode 100644
index 00000000..1eb9d15e
--- /dev/null
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -0,0 +1,33 @@
+use std::{iter::Sum, ops::Add};
+
+/// Usage statistics for a resource.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct ResourceStats {
+    /// CPU utilization in CPU cores.
+    pub cpu: f64,
+    /// Number of assigned CPUs or CPU limit.
+    pub maxcpu: f64,
+    /// Used memory in bytes.
+    pub mem: usize,
+    /// Maximum assigned memory in bytes.
+    pub maxmem: usize,
+}
+
+impl Add for ResourceStats {
+    type Output = Self;
+
+    fn add(self, other: Self) -> Self {
+        Self {
+            cpu: self.cpu + other.cpu,
+            maxcpu: self.maxcpu + other.maxcpu,
+            mem: self.mem + other.mem,
+            maxmem: self.maxmem + other.maxmem,
+        }
+    }
+}
+
+impl Sum for ResourceStats {
+    fn sum<I: Iterator<Item = Self>>(iter: I) -> Self {
+        iter.fold(Self::default(), |a, b| a + b)
+    }
+}
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 39ee44ce..bb38f238 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -1,9 +1,15 @@
 use anyhow::Error;
 
-use crate::{
-    pve_static::{StaticNodeUsage, StaticServiceUsage},
-    topsis,
-};
+use crate::{node::NodeStats, resource::ResourceStats, topsis};
+
+/// The scheduler view of a node.
+#[derive(Clone, Debug)]
+pub struct NodeUsage {
+    /// The identifier of the node.
+    pub name: String,
+    /// The usage statistics of the node.
+    pub stats: NodeStats,
+}
 
 criteria_struct! {
     /// A given alternative.
@@ -22,69 +28,82 @@ criteria_struct! {
     static PVE_HA_TOPSIS_CRITERIA;
 }
 
-/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the resource would already be running on each.
-///
-/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
-/// is better.
-pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
-    nodes: &[T],
-    resource: &StaticServiceUsage,
-) -> Result<Vec<(String, f64)>, Error> {
-    let len = nodes.len();
-
-    let matrix = nodes
-        .iter()
-        .enumerate()
-        .map(|(target_index, _)| {
-            // Base values on percentages to allow comparing nodes with different stats.
-            let mut highest_cpu = 0.0;
-            let mut squares_cpu = 0.0;
-            let mut highest_mem = 0.0;
-            let mut squares_mem = 0.0;
-
-            for (index, node) in nodes.iter().enumerate() {
-                let node = node.as_ref();
-                let new_cpu = if index == target_index {
-                    if resource.maxcpu == 0.0 {
-                        node.cpu + node.maxcpu as f64
-                    } else {
-                        node.cpu + resource.maxcpu
-                    }
-                } else {
-                    node.cpu
-                } / (node.maxcpu as f64);
-                highest_cpu = f64::max(highest_cpu, new_cpu);
-                squares_cpu += new_cpu.powi(2);
-
-                let new_mem = if index == target_index {
-                    node.mem + resource.maxmem
-                } else {
-                    node.mem
-                } as f64
-                    / node.maxmem as f64;
-                highest_mem = f64::max(highest_mem, new_mem);
-                squares_mem += new_mem.powi(2);
-            }
-
-            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-            // 1.004 is only slightly more than 1.002.
-            PveTopsisAlternative {
-                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                highest_cpu: 1.0 + highest_cpu,
-                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                highest_memory: 1.0 + highest_mem,
-            }
-            .into()
-        })
-        .collect::<Vec<_>>();
-
-    let scores =
-        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
-    Ok(scores
-        .into_iter()
-        .enumerate()
-        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
-        .collect())
+pub struct Scheduler {
+    nodes: Vec<NodeUsage>,
+}
+
+impl Scheduler {
+    /// Instantiate scheduler instance from node usages.
+    pub fn from_nodes<I>(nodes: I) -> Self
+    where
+        I: IntoIterator<Item: Into<NodeUsage>>,
+    {
+        Self {
+            nodes: nodes.into_iter().map(|node| node.into()).collect(),
+        }
+    }
+
+    /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
+    ///
+    /// The scoring is done as if the resource is already started on each node. This assumes that
+    /// the already started resource consumes the maximum amount of each stat according to its
+    /// `resource_stats`.
+    ///
+    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
+    /// score is better.
+    pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
+        &self,
+        resource_stats: T,
+    ) -> Result<Vec<(String, f64)>, Error> {
+        let len = self.nodes.len();
+        let resource_stats = resource_stats.into();
+
+        let matrix = self
+            .nodes
+            .iter()
+            .enumerate()
+            .map(|(target_index, _)| {
+                // Base values on percentages to allow comparing nodes with different stats.
+                let mut highest_cpu = 0.0;
+                let mut squares_cpu = 0.0;
+                let mut highest_mem = 0.0;
+                let mut squares_mem = 0.0;
+
+                for (index, node) in self.nodes.iter().enumerate() {
+                    let mut new_stats = node.stats;
+
+                    if index == target_index {
+                        new_stats.add_started_resource(&resource_stats)
+                    };
+
+                    let new_cpu = new_stats.cpu_load();
+                    highest_cpu = f64::max(highest_cpu, new_cpu);
+                    squares_cpu += new_cpu.powi(2);
+
+                    let new_mem = new_stats.mem_load();
+                    highest_mem = f64::max(highest_mem, new_mem);
+                    squares_mem += new_mem.powi(2);
+                }
+
+                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+                // 1.004 is only slightly more than 1.002.
+                PveTopsisAlternative {
+                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+                    highest_cpu: 1.0 + highest_cpu,
+                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+                    highest_memory: 1.0 + highest_mem,
+                }
+                .into()
+            })
+            .collect::<Vec<_>>();
+
+        let scores =
+            topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+        Ok(scores
+            .into_iter()
+            .enumerate()
+            .map(|(n, score)| (self.nodes[n].name.to_string(), score))
+            .collect())
+    }
 }
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
new file mode 100644
index 00000000..c7a9dab9
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -0,0 +1,200 @@
+use anyhow::Error;
+use proxmox_resource_scheduling::{
+    node::NodeStats,
+    resource::ResourceStats,
+    scheduler::{NodeUsage, Scheduler},
+};
+
+fn new_homogeneous_cluster_scheduler() -> Scheduler {
+    let (maxcpu, maxmem) = (16, 64 * (1 << 30));
+
+    let node1 = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu,
+            mem: 12334 << 20,
+            maxmem,
+        },
+    };
+
+    let node2 = NodeUsage {
+        name: String::from("node2"),
+        stats: NodeStats {
+            cpu: 15.184,
+            maxcpu,
+            mem: 529 << 20,
+            maxmem,
+        },
+    };
+
+    let node3 = NodeUsage {
+        name: String::from("node3"),
+        stats: NodeStats {
+            cpu: 5.2,
+            maxcpu,
+            mem: 9381 << 20,
+            maxmem,
+        },
+    };
+
+    Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn new_heterogeneous_cluster_scheduler() -> Scheduler {
+    let node1 = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu: 16,
+            mem: 12334 << 20,
+            maxmem: 128 << 30,
+        },
+    };
+
+    let node2 = NodeUsage {
+        name: String::from("node2"),
+        stats: NodeStats {
+            cpu: 15.184,
+            maxcpu: 32,
+            mem: 529 << 20,
+            maxmem: 96 << 30,
+        },
+    };
+
+    let node3 = NodeUsage {
+        name: String::from("node3"),
+        stats: NodeStats {
+            cpu: 5.2,
+            maxcpu: 24,
+            mem: 9381 << 20,
+            maxmem: 64 << 30,
+        },
+    };
+
+    Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn rank_nodes_to_start_resource(
+    scheduler: &Scheduler,
+    resource_stats: ResourceStats,
+) -> Result<Vec<String>, Error> {
+    let mut alternatives = scheduler.score_nodes_to_start_resource(resource_stats)?;
+
+    alternatives.sort_by(|a, b| b.1.total_cmp(&a.1));
+
+    Ok(alternatives
+        .iter()
+        .map(|alternative| alternative.0.to_string())
+        .collect())
+}
+
+#[test]
+fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    let heavy_memory_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 1.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+        vec!["node2", "node3", "node1"]
+    );
+
+    let heavy_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 0.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node2", "node3", "node1"]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    let heavy_memory_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 1.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+        vec!["node2", "node1", "node3"]
+    );
+
+    let heavy_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+        vec!["node3", "node2", "node1"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 0.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node2", "node1", "node3"]
+    );
+
+    Ok(())
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
  2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-26 10:19   ` Dominik Rusovac
  2026-03-26 14:16     ` Daniel Kral
  0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:19 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

pls find my comments inline, mostly nits.

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The existing score_nodes_to_start_resource(...) function is dependent on
> the StaticNodeUsage and StaticServiceUsage structs.
>
> To use this function for other usage stats structs as well, declare
> generic NodeStats and ResourceStats structs, that the users can convert
> into. These are used to make score_nodes_to_start_resource(...) and its
> documentation generic.
>
> The pve_static::score_nodes_to_start_service(...) is marked as
> deprecated accordingly. The usage-related structs are marked as
> deprecated as well as the specific usage implementations - including
> their serialization and deserialization - should be handled by the
> caller now.
>
> This is best viewed with the git option --ignore-all-space.
>
> No functional changes intended.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> This patch was "[RFC proxmox 2/5] resource-scheduling: introduce generic
> cluster usage implementation" in the RFC v1. Sorry for the ambigious
> naming with the next patch!
>
> changes v1 -> v2:
> - add more information to the patch message
> - split out `NodeStats` and `ResourceStats` to their own modules, which
>   will also be used in an upcoming patch for the `Usage` implementation
> - add deprecation note to pve_static::score_nodes_to_start_service()
> - add deprecation attribute to other major pve_static items as well
> - impl `Add` and `Sum` for ResourceStats, which will be used for the
>   resource bundling in pve-rs later
> - eagerly implement common traits (especially Clone and Debug)
> - add test cases for the
>   scheduler::Scheduler::score_nodes_to_start_resource()

[snip]

good to have: 

    #[derive(Clone, Debug)]

> +pub struct Scheduler {
> +    nodes: Vec<NodeUsage>,
> +}
> +

nit: The implementation of `Scheduler` is totally fine as-is.  This is
just my two cents, as this was mentioned off-list. 

I believe that for the implementation of the scheduler working with
enum variants and a trait and then exploiting static dispatch is more
convenient and easier to maintain, e.g.:

    pub enum Schedulerr<Nodes> {
        Topsis(Nodes),
        BruteForce(Nodes),
    }
    
    pub trait Decide {
        fn node_imbalance(&self) -> f64;
    
        fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64;
    
        fn score_best_balancing_migration_candidates(
            &self,
            candidates: &[MigrationCandidate],
            limit: usize,
        ) -> Result<Vec<ScoredMigration>, Error>;
    }
    
    impl Decide for Schedulerr<Vec<NodeUsage>> {
        fn node_imbalance(&self) -> f64 {
            match self {
                Self::Topsis(nodes) | Self::BruteForce(nodes) => {
                    calculate_node_imbalance(nodes, |node| node.stats.load())
                }
            }
        }
    
        fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64 {
            match self {
                Self::Topsis(nodes) | Self::BruteForce(nodes) => {
                    calculate_node_imbalance(nodes, |node| {
                        let mut new_stats = node.stats;
    
                        if node.name == candidate.migration.source_node {
                            new_stats.remove_running_resource(&candidate.stats);
                        } else if node.name == candidate.migration.target_node {
                            new_stats.add_running_resource(&candidate.stats);
                        }
    
                        new_stats.load()
                    })
                }
            }
        }
    
        fn score_best_balancing_migration_candidates(
            &self,
            candidates: &[MigrationCandidate],
            limit: usize,
        ) -> Result<Vec<ScoredMigration>, Error> {
            match self {
                Self::Topsis(nodes) => {
                    let len = nodes.len();
    
                    let matrix = candidates
                        .iter()
                        .map(|candidate| {
                            let resource_stats = &candidate.stats;
                            let source_node = &candidate.migration.source_node;
                            let target_node = &candidate.migration.target_node;
    
                            let mut highest_cpu = 0.0;
                            let mut squares_cpu = 0.0;
                            let mut highest_mem = 0.0;
                            let mut squares_mem = 0.0;
    
                            for node in nodes.iter() {
                                let new_stats = {
                                    let mut new_stats = node.stats;
    
                                    if &node.name == source_node {
                                        new_stats.remove_running_resource(resource_stats);
                                    } else if &node.name == target_node {
                                        new_stats.add_running_resource(resource_stats);
                                    }
    
                                    new_stats
                                };
    
                                let new_cpu = new_stats.cpu_load();
                                highest_cpu = f64::max(highest_cpu, new_cpu);
                                squares_cpu += new_cpu.powi(2);
    
                                let new_mem = new_stats.mem_load();
                                highest_mem = f64::max(highest_mem, new_mem);
                                squares_mem += new_mem.powi(2);
                            }
    
                            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
                            // 1.004 is only slightly more than 1.002.
                            PveTopsisAlternative {
                                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
                                highest_cpu: 1.0 + highest_cpu,
                                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
                                highest_memory: 1.0 + highest_mem,
                            }
                            .into()
                        })
                        .collect::<Vec<_>>();
    
                    let best_alternatives = topsis::rank_alternatives(
                        &topsis::Matrix::new(matrix)?,
                        &PVE_HA_TOPSIS_CRITERIA,
                    )?;
    
                    Ok(best_alternatives
                        .into_iter()
                        .take(limit)
                       .map(|i| {
                            let imbalance =
                                self.node_imbalance_with_migration_candidate(&candidates[i]);
    
                            ScoredMigration::new(candidates[i].clone(), imbalance)
                        })
                        .collect())
                }
                Self::BruteForce(_) => {
                    let mut scored_migrations = candidates
                        .iter()
                        .map(|candidate| {
                            let imbalance = self.node_imbalance_with_migration_candidate(candidate);
    
                            // NOTE: could avoid clone if Migration had additional score field
                            Reverse(ScoredMigration::new(candidate.clone(), imbalance))
                        })
                        .collect::<BinaryHeap<_>>();
    
                    let mut best_migrations = Vec::with_capacity(limit);
    
                    // BinaryHeap::into_iter_sorted() is still in nightly unfortunately
                    while best_migrations.len() < limit {
                        match scored_migrations.pop() {
                            Some(Reverse(alternative)) => best_migrations.push(alternative),
                            None => break,
                        }
                    }
    
                    Ok(best_migrations)
                }
            }
        }
    }
    
    pub fn score_best_balancing_migration_candidates(
        scheduler: &impl Decide,
        candidates: &[MigrationCandidate],
        limit: usize,
    ) -> Result<Vec<ScoredMigration>, Error> {
        scheduler.score_best_balancing_migration_candidates(candidates, limit)
    }

In a nutshell, this declares what a scheduler ought to be able to do to be used
for scoring (that is, implementing the `Decide` trait); and implements
all the functionality for all the variants in one place.  

Nice side-effects of this design: 
* one scoring function implements all the variants in one place, which is nice, I think
* adding/removing a scheduler variant would become more systematic
* modifying scheduler variants in terms of how they score or, for example, how they measure
  imbalance, would also be more straightforward

Again, just my two cents.

> +impl Scheduler {
> +    /// Instantiate scheduler instance from node usages.
> +    pub fn from_nodes<I>(nodes: I) -> Self
> +    where
> +        I: IntoIterator<Item: Into<NodeUsage>>,
> +    {
> +        Self {
> +            nodes: nodes.into_iter().map(|node| node.into()).collect(),
> +        }
> +    }
> +
> +    /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
> +    ///
> +    /// The scoring is done as if the resource is already started on each node. This assumes that
> +    /// the already started resource consumes the maximum amount of each stat according to its
> +    /// `resource_stats`.
> +    ///
> +    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
> +    /// score is better.
> +    pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
> +        &self,
> +        resource_stats: T,
> +    ) -> Result<Vec<(String, f64)>, Error> {
> +        let len = self.nodes.len();
> +        let resource_stats = resource_stats.into();
> +
> +        let matrix = self
> +            .nodes
> +            .iter()
> +            .enumerate()
> +            .map(|(target_index, _)| {
> +                // Base values on percentages to allow comparing nodes with different stats.
> +                let mut highest_cpu = 0.0;
> +                let mut squares_cpu = 0.0;
> +                let mut highest_mem = 0.0;
> +                let mut squares_mem = 0.0;
> +
> +                for (index, node) in self.nodes.iter().enumerate() {
> +                    let mut new_stats = node.stats;
> +
> +                    if index == target_index {
> +                        new_stats.add_started_resource(&resource_stats)
> +                    };
> +
> +                    let new_cpu = new_stats.cpu_load();
> +                    highest_cpu = f64::max(highest_cpu, new_cpu);
> +                    squares_cpu += new_cpu.powi(2);
> +
> +                    let new_mem = new_stats.mem_load();
> +                    highest_mem = f64::max(highest_mem, new_mem);
> +                    squares_mem += new_mem.powi(2);
> +                }
> +
> +                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
> +                // 1.004 is only slightly more than 1.002.
> +                PveTopsisAlternative {
> +                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
> +                    highest_cpu: 1.0 + highest_cpu,
> +                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
> +                    highest_memory: 1.0 + highest_mem,
> +                }
> +                .into()
> +            })
> +            .collect::<Vec<_>>();
> +
> +        let scores =
> +            topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
> +
> +        Ok(scores
> +            .into_iter()
> +            .enumerate()
> +            .map(|(n, score)| (self.nodes[n].name.to_string(), score))
> +            .collect())
> +    }
>  }

[snip]

in the future, the proptest crate could come in handy for such helper
functions. 

in general, I would propose to add proptests in the future.

for some examples and inspiration, see [0].

[0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox.com/T/

> +fn new_homogeneous_cluster_scheduler() -> Scheduler {
> +    let (maxcpu, maxmem) = (16, 64 * (1 << 30));
> +
> +    let node1 = NodeUsage {
> +        name: String::from("node1"),
> +        stats: NodeStats {
> +            cpu: 1.7,
> +            maxcpu,
> +            mem: 12334 << 20,
> +            maxmem,
> +        },
> +    };
> +
> +    let node2 = NodeUsage {
> +        name: String::from("node2"),
> +        stats: NodeStats {
> +            cpu: 15.184,
> +            maxcpu,
> +            mem: 529 << 20,
> +            maxmem,
> +        },
> +    };
> +
> +    let node3 = NodeUsage {
> +        name: String::from("node3"),
> +        stats: NodeStats {
> +            cpu: 5.2,
> +            maxcpu,
> +            mem: 9381 << 20,
> +            maxmem,
> +        },
> +    };
> +
> +    Scheduler::from_nodes(vec![node1, node2, node3])
> +}
> +
> +fn new_heterogeneous_cluster_scheduler() -> Scheduler {
> +    let node1 = NodeUsage {
> +        name: String::from("node1"),
> +        stats: NodeStats {
> +            cpu: 1.7,
> +            maxcpu: 16,
> +            mem: 12334 << 20,
> +            maxmem: 128 << 30,
> +        },
> +    };
> +
> +    let node2 = NodeUsage {
> +        name: String::from("node2"),
> +        stats: NodeStats {
> +            cpu: 15.184,
> +            maxcpu: 32,
> +            mem: 529 << 20,
> +            maxmem: 96 << 30,
> +        },
> +    };
> +
> +    let node3 = NodeUsage {
> +        name: String::from("node3"),
> +        stats: NodeStats {
> +            cpu: 5.2,
> +            maxcpu: 24,
> +            mem: 9381 << 20,
> +            maxmem: 64 << 30,
> +        },
> +    };
> +
> +    Scheduler::from_nodes(vec![node1, node2, node3])
> +}

[snip]

> +#[test]
> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
> +    let scheduler = new_homogeneous_cluster_scheduler();
> +
> +    let heavy_memory_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 1.0,
> +        mem: 0,
> +        maxmem: 12 << 30,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
> +        vec!["node2", "node3", "node1"]
> +    );
> +
> +    let heavy_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 12.0,
> +        mem: 0,
> +        maxmem: 0,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
> +        vec!["node1", "node3", "node2"]
> +    );
> +
> +    let unlimited_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 0.0,
> +        mem: 0,
> +        maxmem: 0,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> +        vec!["node1", "node3", "node2"]
> +    );
> +

nit: confusing variable name

> +    let unlimited_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 12.0,
> +        mem: 0,
> +        maxmem: 12 << 30,
> +    };
> +
> +    assert_eq!(

nit: confusing variable name

> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> +        vec!["node2", "node3", "node1"]
> +    );
> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
> +    let scheduler = new_heterogeneous_cluster_scheduler();
> +
> +    let heavy_memory_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 1.0,
> +        mem: 0,
> +        maxmem: 12 << 30,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
> +        vec!["node2", "node1", "node3"]
> +    );
> +
> +    let heavy_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 12.0,
> +        mem: 0,
> +        maxmem: 0,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
> +        vec!["node3", "node2", "node1"]
> +    );
> +
> +    let unlimited_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 0.0,
> +        mem: 0,
> +        maxmem: 0,
> +    };
> +
> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> +        vec!["node1", "node3", "node2"]
> +    );
> +

nit: confusing variable name

> +    let unlimited_cpu_resource_stats = ResourceStats {
> +        cpu: 0.0,
> +        maxcpu: 12.0,
> +        mem: 0,
> +        maxmem: 12 << 30,
> +    };
> +

nit: confusing variable name

> +    assert_eq!(
> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> +        vec!["node2", "node1", "node3"]
> +    );
> +
> +    Ok(())
> +}





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
  2026-03-26 10:19   ` Dominik Rusovac
@ 2026-03-26 14:16     ` Daniel Kral
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:16 UTC (permalink / raw)
  To: Dominik Rusovac, pve-devel

On Thu Mar 26, 2026 at 11:19 AM CET, Dominik Rusovac wrote:
> [snip]
>
> good to have: 
>
>     #[derive(Clone, Debug)]

ACK, thanks!

>
>> +pub struct Scheduler {
>> +    nodes: Vec<NodeUsage>,
>> +}
>> +
>
> nit: The implementation of `Scheduler` is totally fine as-is.  This is
> just my two cents, as this was mentioned off-list. 
>
> I believe that for the implementation of the scheduler working with
> enum variants and a trait and then exploiting static dispatch is more
> convenient and easier to maintain, e.g.:
>
>     pub enum Schedulerr<Nodes> {
>         Topsis(Nodes),
>         BruteForce(Nodes),
>     }
>     
>     pub trait Decide {
>         fn node_imbalance(&self) -> f64;
>     
>         fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64;
>     
>         fn score_best_balancing_migration_candidates(
>             &self,
>             candidates: &[MigrationCandidate],
>             limit: usize,
>         ) -> Result<Vec<ScoredMigration>, Error>;
>     }
>     
>     impl Decide for Schedulerr<Vec<NodeUsage>> {

[...]

>     }
>     
>     pub fn score_best_balancing_migration_candidates(
>         scheduler: &impl Decide,
>         candidates: &[MigrationCandidate],
>         limit: usize,
>     ) -> Result<Vec<ScoredMigration>, Error> {
>         scheduler.score_best_balancing_migration_candidates(candidates, limit)
>     }
>
> In a nutshell, this declares what a scheduler ought to be able to do to be used
> for scoring (that is, implementing the `Decide` trait); and implements
> all the functionality for all the variants in one place.  
>
> Nice side-effects of this design: 
> * one scoring function implements all the variants in one place, which is nice, I think
> * adding/removing a scheduler variant would become more systematic
> * modifying scheduler variants in terms of how they score or, for example, how they measure
>   imbalance, would also be more straightforward
>
> Again, just my two cents.

As discussed off-list, I like how readable the different method
alternatives are with the pattern matching and that makes maintenance
and reading diffs easier.

Though I'm not sure whether it is a good idea to define the `Scheduler`
by the algorithms that one or more of their methods use. Choosing one
algorithm might only be something we want in the short-time, e.g.,
whether TOPSIS or bruteforce is the right fit for users, and might drop
bruteforce or TOPSIS for these methods in the future or might add
another method, which uses another new algorithm.

Then it might be better that the methods themselves, such as
score_nodes_to_start_resource() and
score_best_balancing_migration_candidates() defines which algorithm
these use. In that case it is only coincidental that two methods use the
same algorithm internally.

Maybe we could still go for some parameter that allows passing the
preferred algorithm for score_best_balancing_migration_candidates(),
which now can either be Bruteforce or Topsis for the latter?

Still, thanks a lot for the suggestion! I like the idea a lot, but I'm
only a little unsure whether it is the right fit in this situation wrt.
to future changes.

>
>> +impl Scheduler {
>> +    /// Instantiate scheduler instance from node usages.
>> +    pub fn from_nodes<I>(nodes: I) -> Self
>> +    where
>> +        I: IntoIterator<Item: Into<NodeUsage>>,
>> +    {
>> +        Self {
>> +            nodes: nodes.into_iter().map(|node| node.into()).collect(),
>> +        }
>> +    }
>> +
>> +    /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
>> +    ///
>> +    /// The scoring is done as if the resource is already started on each node. This assumes that
>> +    /// the already started resource consumes the maximum amount of each stat according to its
>> +    /// `resource_stats`.
>> +    ///
>> +    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
>> +    /// score is better.
>> +    pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
>> +        &self,
>> +        resource_stats: T,
>> +    ) -> Result<Vec<(String, f64)>, Error> {
>> +        let len = self.nodes.len();
>> +        let resource_stats = resource_stats.into();
>> +
>> +        let matrix = self
>> +            .nodes
>> +            .iter()
>> +            .enumerate()
>> +            .map(|(target_index, _)| {
>> +                // Base values on percentages to allow comparing nodes with different stats.
>> +                let mut highest_cpu = 0.0;
>> +                let mut squares_cpu = 0.0;
>> +                let mut highest_mem = 0.0;
>> +                let mut squares_mem = 0.0;
>> +
>> +                for (index, node) in self.nodes.iter().enumerate() {
>> +                    let mut new_stats = node.stats;
>> +
>> +                    if index == target_index {
>> +                        new_stats.add_started_resource(&resource_stats)
>> +                    };
>> +
>> +                    let new_cpu = new_stats.cpu_load();
>> +                    highest_cpu = f64::max(highest_cpu, new_cpu);
>> +                    squares_cpu += new_cpu.powi(2);
>> +
>> +                    let new_mem = new_stats.mem_load();
>> +                    highest_mem = f64::max(highest_mem, new_mem);
>> +                    squares_mem += new_mem.powi(2);
>> +                }
>> +
>> +                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
>> +                // 1.004 is only slightly more than 1.002.
>> +                PveTopsisAlternative {
>> +                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
>> +                    highest_cpu: 1.0 + highest_cpu,
>> +                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
>> +                    highest_memory: 1.0 + highest_mem,
>> +                }
>> +                .into()
>> +            })
>> +            .collect::<Vec<_>>();
>> +
>> +        let scores =
>> +            topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
>> +
>> +        Ok(scores
>> +            .into_iter()
>> +            .enumerate()
>> +            .map(|(n, score)| (self.nodes[n].name.to_string(), score))
>> +            .collect())
>> +    }
>>  }
>
> [snip]
>
> in the future, the proptest crate could come in handy for such helper
> functions. 
>
> in general, I would propose to add proptests in the future.
>
> for some examples and inspiration, see [0].
>
> [0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox.com/T/

+1

>
>> +fn new_homogeneous_cluster_scheduler() -> Scheduler {
>> +    let (maxcpu, maxmem) = (16, 64 * (1 << 30));
>> +
>> +    let node1 = NodeUsage {
>> +        name: String::from("node1"),
>> +        stats: NodeStats {
>> +            cpu: 1.7,
>> +            maxcpu,
>> +            mem: 12334 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    let node2 = NodeUsage {
>> +        name: String::from("node2"),
>> +        stats: NodeStats {
>> +            cpu: 15.184,
>> +            maxcpu,
>> +            mem: 529 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    let node3 = NodeUsage {
>> +        name: String::from("node3"),
>> +        stats: NodeStats {
>> +            cpu: 5.2,
>> +            maxcpu,
>> +            mem: 9381 << 20,
>> +            maxmem,
>> +        },
>> +    };
>> +
>> +    Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>> +
>> +fn new_heterogeneous_cluster_scheduler() -> Scheduler {
>> +    let node1 = NodeUsage {
>> +        name: String::from("node1"),
>> +        stats: NodeStats {
>> +            cpu: 1.7,
>> +            maxcpu: 16,
>> +            mem: 12334 << 20,
>> +            maxmem: 128 << 30,
>> +        },
>> +    };
>> +
>> +    let node2 = NodeUsage {
>> +        name: String::from("node2"),
>> +        stats: NodeStats {
>> +            cpu: 15.184,
>> +            maxcpu: 32,
>> +            mem: 529 << 20,
>> +            maxmem: 96 << 30,
>> +        },
>> +    };
>> +
>> +    let node3 = NodeUsage {
>> +        name: String::from("node3"),
>> +        stats: NodeStats {
>> +            cpu: 5.2,
>> +            maxcpu: 24,
>> +            mem: 9381 << 20,
>> +            maxmem: 64 << 30,
>> +        },
>> +    };
>> +
>> +    Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>
> [snip]
>
>> +#[test]
>> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> +    let scheduler = new_homogeneous_cluster_scheduler();
>> +
>> +    let heavy_memory_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 1.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> +        vec!["node2", "node3", "node1"]
>> +    );
>> +
>> +    let heavy_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 0.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>
> nit: confusing variable name

ACK this and the following, that was a copy-paste error unfortunately.

>
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>
> nit: confusing variable name

ACK

>
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node2", "node3", "node1"]
>> +    );
>> +
>> +    Ok(())
>> +}
>> +
>> +#[test]
>> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> +    let scheduler = new_heterogeneous_cluster_scheduler();
>> +
>> +    let heavy_memory_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 1.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> +        vec!["node2", "node1", "node3"]
>> +    );
>> +
>> +    let heavy_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> +        vec!["node3", "node2", "node1"]
>> +    );
>> +
>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 0.0,
>> +        mem: 0,
>> +        maxmem: 0,
>> +    };
>> +
>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node1", "node3", "node2"]
>> +    );
>> +
>
> nit: confusing variable name
>

ACK

>> +    let unlimited_cpu_resource_stats = ResourceStats {
>> +        cpu: 0.0,
>> +        maxcpu: 12.0,
>> +        mem: 0,
>> +        maxmem: 12 << 30,
>> +    };
>> +
>
> nit: confusing variable name
>

ACK

>> +    assert_eq!(
>> +        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> +        vec!["node2", "node1", "node3"]
>> +    );
>> +
>> +    Ok(())
>> +}





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (3 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:28   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
                   ` (34 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

This is a more generic version of the `Usage` implementation from the
pve_static bindings in the pve_rs repository.

As the upcoming load balancing scheduler actions and dynamic resource
scheduler will need more information about each resource, this further
improves on the state tracking of each resource:

In this implementation, a resource is composed of its usage statistics
and its two essential states: the running state and the node placement.
The non_exhaustive attribute ensures that usages need to construct the
a Resource instance through its API.

Users can repeatedly use the current state of Usage to make scheduling
decisions with the to_scheduler() method. This method takes an
implementation of UsageAggregator, which dictates how the usage
information is represented to the Scheduler.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

This patch is added to move the handling of specific usage stats and
their (de)serialization to the pve-rs bindings and have the general
functionality in this crate.

 proxmox-resource-scheduling/src/lib.rs      |   1 +
 proxmox-resource-scheduling/src/node.rs     |  40 +++++
 proxmox-resource-scheduling/src/resource.rs | 119 +++++++++++++
 proxmox-resource-scheduling/src/usage.rs    | 183 ++++++++++++++++++++
 proxmox-resource-scheduling/tests/usage.rs  | 153 ++++++++++++++++
 5 files changed, 496 insertions(+)
 create mode 100644 proxmox-resource-scheduling/src/usage.rs
 create mode 100644 proxmox-resource-scheduling/tests/usage.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 12b743fe..99ca16d8 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -3,6 +3,7 @@ pub mod topsis;
 
 pub mod node;
 pub mod resource;
+pub mod usage;
 
 pub mod scheduler;
 
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index e6227eda..be462782 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -1,3 +1,5 @@
+use std::collections::HashSet;
+
 use crate::resource::ResourceStats;
 
 /// Usage statistics of a node.
@@ -37,3 +39,41 @@ impl NodeStats {
         self.mem as f64 / self.maxmem as f64
     }
 }
+
+/// A node in the cluster context.
+#[derive(Clone, Debug)]
+pub struct Node {
+    /// Base stats of the node.
+    stats: NodeStats,
+    /// The identifiers of the resources assigned to the node.
+    resources: HashSet<String>,
+}
+
+impl Node {
+    pub fn new(stats: NodeStats) -> Self {
+        Self {
+            stats,
+            resources: HashSet::new(),
+        }
+    }
+
+    pub fn add_resource(&mut self, sid: &str) -> bool {
+        self.resources.insert(sid.to_string())
+    }
+
+    pub fn remove_resource(&mut self, sid: &str) -> bool {
+        self.resources.remove(sid)
+    }
+
+    pub fn stats(&self) -> NodeStats {
+        self.stats
+    }
+
+    pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
+        self.resources.iter()
+    }
+
+    pub fn contains_resource(&self, sid: &str) -> bool {
+        self.resources.contains(sid)
+    }
+}
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
index 1eb9d15e..2aa16a51 100644
--- a/proxmox-resource-scheduling/src/resource.rs
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -1,5 +1,7 @@
 use std::{iter::Sum, ops::Add};
 
+use anyhow::{bail, Error};
+
 /// Usage statistics for a resource.
 #[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
 pub struct ResourceStats {
@@ -31,3 +33,120 @@ impl Sum for ResourceStats {
         iter.fold(Self::default(), |a, b| a + b)
     }
 }
+
+/// Execution state of a resource.
+#[derive(Copy, Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourceState {
+    /// The resource is stopped.
+    Stopped,
+    /// The resource is scheduled to start.
+    Starting,
+    /// The resource is started and currently running.
+    Started,
+}
+
+/// Placement of a resource.
+#[derive(Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourcePlacement {
+    /// The resource is on `current_node`.
+    Stationary { current_node: String },
+    /// The resource is being moved from `current_node` to `target_node`.
+    Moving {
+        current_node: String,
+        target_node: String,
+    },
+}
+
+impl ResourcePlacement {
+    fn nodenames(&self) -> Vec<&str> {
+        match self {
+            ResourcePlacement::Stationary { current_node } => vec![&current_node],
+            ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            } => vec![&current_node, &target_node],
+        }
+    }
+}
+
+/// A resource in the cluster context.
+#[derive(Clone, Debug)]
+#[non_exhaustive]
+pub struct Resource {
+    /// The usage statistics of the resource.
+    stats: ResourceStats,
+    /// The execution state of the resource.
+    state: ResourceState,
+    /// The placement of the resource.
+    placement: ResourcePlacement,
+}
+
+impl Resource {
+    pub fn new(stats: ResourceStats, state: ResourceState, placement: ResourcePlacement) -> Self {
+        Self {
+            stats,
+            state,
+            placement,
+        }
+    }
+
+    /// Put the resource into a moving state with `target_node`.
+    ///
+    /// This method fails if the resource is already moving.
+    pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
+        match &self.placement {
+            ResourcePlacement::Stationary { current_node } => {
+                self.placement = ResourcePlacement::Moving {
+                    current_node: current_node.to_string(),
+                    target_node,
+                };
+            }
+            ResourcePlacement::Moving { .. } => bail!("resource is already moving"),
+        };
+
+        Ok(())
+    }
+
+    /// Handles the external removal of a node.
+    ///
+    /// Returns whether the resource does not have any node left.
+    pub fn remove_node(&mut self, nodename: &str) -> bool {
+        match &self.placement {
+            ResourcePlacement::Stationary { current_node } => current_node == nodename,
+            ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            } => {
+                if current_node == nodename {
+                    self.placement = ResourcePlacement::Stationary {
+                        current_node: target_node.to_string(),
+                    };
+                } else if target_node == nodename {
+                    self.placement = ResourcePlacement::Stationary {
+                        current_node: current_node.to_string(),
+                    };
+                }
+
+                false
+            }
+        }
+    }
+
+    pub fn state(&self) -> ResourceState {
+        self.state
+    }
+
+    pub fn stats(&self) -> ResourceStats {
+        self.stats
+    }
+
+    pub fn placement(&self) -> &ResourcePlacement {
+        &self.placement
+    }
+
+    pub fn nodenames(&self) -> Vec<&str> {
+        self.placement.nodenames()
+    }
+}
diff --git a/proxmox-resource-scheduling/src/usage.rs b/proxmox-resource-scheduling/src/usage.rs
new file mode 100644
index 00000000..78ccc453
--- /dev/null
+++ b/proxmox-resource-scheduling/src/usage.rs
@@ -0,0 +1,183 @@
+use anyhow::{bail, Error};
+
+use std::collections::HashMap;
+
+use crate::{
+    node::{Node, NodeStats},
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    scheduler::{NodeUsage, Scheduler},
+};
+
+/// The state of the usage in the cluster.
+///
+/// The cluster usage represents the current state of the assignments between nodes and resources
+/// and their usage statistics. A resource can be placed on these nodes according to their
+/// placement state. See [`crate::resource::Resource`] for more information.
+///
+/// The cluster usage state can be used to build a current state for the [`Scheduler`].
+#[derive(Default)]
+pub struct Usage {
+    nodes: HashMap<String, Node>,
+    resources: HashMap<String, Resource>,
+}
+
+/// An aggregator for the [`Usage`] maps the cluster usage to node usage statistics that are
+/// relevant for the scheduler.
+pub trait UsageAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage>;
+}
+
+impl Usage {
+    /// Instantiate an empty cluster usage.
+    pub fn new() -> Self {
+        Self::default()
+    }
+
+    /// Add a node to the cluster usage.
+    ///
+    /// This method fails if a node with the same `nodename` already exists.
+    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
+        if self.nodes.contains_key(&nodename) {
+            bail!("node '{}' already exists", nodename);
+        }
+
+        self.nodes.insert(nodename, Node::new(stats));
+
+        Ok(())
+    }
+
+    /// Remove a node from the cluster usage.
+    pub fn remove_node(&mut self, nodename: &str) {
+        if let Some(node) = self.nodes.remove(nodename) {
+            node.resources_iter().for_each(|sid| {
+                if let Some(resource) = self.resources.get_mut(sid)
+                    && resource.remove_node(nodename)
+                {
+                    self.resources.remove(sid);
+                }
+            });
+        }
+    }
+
+    /// Returns a reference to the [`Node`] with the identifier `nodename`.
+    pub fn get_node(&self, nodename: &str) -> Option<&Node> {
+        self.nodes.get(nodename)
+    }
+
+    /// Returns an iterator for the cluster usage's nodes.
+    pub fn nodes_iter(&self) -> impl Iterator<Item = (&String, &Node)> {
+        self.nodes.iter()
+    }
+
+    /// Returns an iterator for the cluster usage's nodes.
+    pub fn nodenames_iter(&self) -> impl Iterator<Item = &String> {
+        self.nodes.keys()
+    }
+
+    /// Returns whether the node with the identifier `nodename` is present in the cluster usage.
+    pub fn contains_node(&self, nodename: &str) -> bool {
+        self.nodes.contains_key(nodename)
+    }
+
+    fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
+        if nodenames
+            .iter()
+            .any(|nodename| !self.nodes.contains_key(*nodename))
+        {
+            bail!("resource nodes do not exist");
+        }
+
+        nodenames.iter().for_each(|nodename| {
+            if let Some(node) = self.nodes.get_mut(*nodename) {
+                node.add_resource(sid);
+            }
+        });
+
+        Ok(())
+    }
+
+    fn remove_resource_from_nodes(&mut self, sid: &str, nodenames: &[&str]) {
+        nodenames.iter().for_each(|nodename| {
+            if let Some(node) = self.nodes.get_mut(*nodename) {
+                node.remove_resource(sid);
+            }
+        });
+    }
+
+    /// Add `resource` with identifier `sid` to cluster usage.
+    ///
+    /// This method fails if a resource with the same `sid` already exists or the resource's nodes
+    /// do not exist in the cluster usage.
+    pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
+        if self.resources.contains_key(&sid) {
+            bail!("resource '{}' already exists", sid);
+        }
+
+        self.add_resource_to_nodes(&sid, resource.nodenames())?;
+
+        self.resources.insert(sid.to_string(), resource);
+
+        Ok(())
+    }
+
+    /// Add `stats` from resource with identifier `sid` to node `nodename` in cluster usage.
+    ///
+    /// For the first call, the resource is assumed to be started and stationary on the given node.
+    /// If there was no intermediate call to remove the resource, the second call will assume that
+    /// the given node is the target node and the resource is being moved there. The second call
+    /// will ignore the value of `stats`.
+    #[deprecated = "only for backwards compatibility, use add_resource(...) instead"]
+    pub fn add_resource_usage_to_node(
+        &mut self,
+        nodename: &str,
+        sid: &str,
+        stats: ResourceStats,
+    ) -> Result<(), Error> {
+        if let Some(resource) = self.resources.get_mut(sid) {
+            resource.moving_to(nodename.to_string())?;
+
+            self.add_resource_to_nodes(sid, vec![nodename])
+        } else {
+            let placement = ResourcePlacement::Stationary {
+                current_node: nodename.to_string(),
+            };
+            let resource = Resource::new(stats, ResourceState::Started, placement);
+
+            self.add_resource(sid.to_string(), resource)
+        }
+    }
+
+    /// Remove resource with identifier `sid` from cluster usage.
+    pub fn remove_resource(&mut self, sid: &str) {
+        if let Some(resource) = self.resources.remove(sid) {
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    self.remove_resource_from_nodes(sid, &[current_node]);
+                }
+                ResourcePlacement::Moving {
+                    current_node,
+                    target_node,
+                } => {
+                    self.remove_resource_from_nodes(sid, &[current_node, target_node]);
+                }
+            }
+        }
+    }
+
+    /// Returns a reference to the [`Resource`] with the identifier `sid`.
+    pub fn get_resource(&self, sid: &str) -> Option<&Resource> {
+        self.resources.get(sid)
+    }
+
+    /// Returns an iterator for the cluster usage's resources.
+    pub fn resources_iter(&self) -> impl Iterator<Item = (&String, &Resource)> {
+        self.resources.iter()
+    }
+
+    /// Use the current cluster usage as a base for a scheduling action.
+    pub fn to_scheduler<F: UsageAggregator>(&self) -> Scheduler {
+        let node_usages = F::aggregate(self);
+
+        Scheduler::from_nodes(node_usages)
+    }
+}
diff --git a/proxmox-resource-scheduling/tests/usage.rs b/proxmox-resource-scheduling/tests/usage.rs
new file mode 100644
index 00000000..eb00d2c6
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/usage.rs
@@ -0,0 +1,153 @@
+use anyhow::{bail, Error};
+use proxmox_resource_scheduling::{
+    node::NodeStats,
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    usage::Usage,
+};
+
+#[test]
+fn test_no_duplicate_nodes() -> Result<(), Error> {
+    let mut usage = Usage::new();
+
+    usage.add_node("node1".to_string(), NodeStats::default())?;
+
+    match usage.add_node("node1".to_string(), NodeStats::default()) {
+        Ok(_) => bail!("cluster usage does allow duplicate node entries"),
+        Err(_) => Ok(()),
+    }
+}
+
+#[test]
+fn test_no_duplicate_resources() -> Result<(), Error> {
+    let mut usage = Usage::new();
+
+    usage.add_node("node1".to_string(), NodeStats::default())?;
+
+    let placement = ResourcePlacement::Stationary {
+        current_node: "node1".to_string(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    usage.add_resource("vm:101".to_string(), resource.clone())?;
+
+    match usage.add_resource("vm:101".to_string(), resource) {
+        Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
+        Err(_) => Ok(()),
+    }
+}
+
+#[test]
+#[allow(deprecated)]
+fn test_add_resource_usage_to_node() -> Result<(), Error> {
+    let mut usage = Usage::new();
+
+    usage.add_node("node1".to_string(), NodeStats::default())?;
+    usage.add_node("node2".to_string(), NodeStats::default())?;
+    usage.add_node("node3".to_string(), NodeStats::default())?;
+
+    usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
+    usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
+
+    if usage
+        .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
+        .is_ok()
+    {
+        bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
+    }
+
+    Ok(())
+}
+
+#[test]
+fn test_add_remove_stationary_resource() -> Result<(), Error> {
+    let mut usage = Usage::new();
+
+    let (sid, nodename) = ("vm:101", "node1");
+
+    usage.add_node(nodename.to_string(), NodeStats::default())?;
+
+    let placement = ResourcePlacement::Stationary {
+        current_node: nodename.to_string(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    usage.add_resource(sid.to_string(), resource)?;
+
+    match (usage.get_resource(sid), usage.get_node(nodename)) {
+        (Some(_), Some(node)) => {
+            if !node.contains_resource(sid) {
+                bail!("resource '{sid}' was not added to node '{nodename}'");
+            }
+        }
+        _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
+    }
+
+    usage.remove_resource(sid);
+
+    match (usage.get_resource(sid), usage.get_node(nodename)) {
+        (None, Some(node)) => {
+            if node.contains_resource(sid) {
+                bail!("resource '{sid}' was not removed from node '{nodename}'");
+            }
+        }
+        _ => bail!("resource '{sid}' was not removed"),
+    }
+
+    Ok(())
+}
+
+#[test]
+fn test_add_remove_moving_resource() -> Result<(), Error> {
+    let mut usage = Usage::new();
+
+    let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
+
+    usage.add_node(current_nodename.to_string(), NodeStats::default())?;
+    usage.add_node(target_nodename.to_string(), NodeStats::default())?;
+
+    let placement = ResourcePlacement::Moving {
+        current_node: current_nodename.to_string(),
+        target_node: target_nodename.to_string(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    usage.add_resource(sid.to_string(), resource)?;
+
+    match (
+        usage.get_resource(sid),
+        usage.get_node(current_nodename),
+        usage.get_node(target_nodename),
+    ) {
+        (Some(_), Some(current_node), Some(target_node)) => {
+            if !current_node.contains_resource("vm:101") {
+                bail!("resource '{sid}' was not added to current node '{current_nodename}'");
+            }
+
+            if !target_node.contains_resource("vm:101") {
+                bail!("resource '{sid}' was not added to target node '{target_nodename}'");
+            }
+        }
+        _ => bail!("resource '{sid}' or nodes were not added"),
+    }
+
+    usage.remove_resource(sid);
+
+    match (
+        usage.get_resource(sid),
+        usage.get_node(current_nodename),
+        usage.get_node(target_nodename),
+    ) {
+        (None, Some(current_node), Some(target_node)) => {
+            if current_node.contains_resource(sid) {
+                bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
+            }
+
+            if target_node.contains_resource(sid) {
+                bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
+            }
+        }
+        _ => bail!("resource '{sid}' was not removed"),
+    }
+
+    Ok(())
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
  2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-26 10:28   ` Dominik Rusovac
  2026-03-26 14:15     ` Daniel Kral
  0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:28 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

pls find my comments inline, mostly relating to nits or tiny things

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> This patch is added to move the handling of specific usage stats and
> their (de)serialization to the pve-rs bindings and have the general
> functionality in this crate.

[snip]

nit: imo, it's more convenient to expose the more ergonomic `&str` type,
using:

       pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
           self.resources.iter().map(String::as_str)
       }

> +    pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
> +        self.resources.iter()
> +    }

[snip] 

> +    pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => {
> +                self.placement = ResourcePlacement::Moving {
> +                    current_node: current_node.to_string(),

nit: 

                       current_node: current_node.to_owned(), 

represents the intention best, that is, owning rather than converting

[snip]

> +    /// Handles the external removal of a node.
> +    ///
> +    /// Returns whether the resource does not have any node left.

Considering what it does, I find the name of this function a bit confusing.

> +    pub fn remove_node(&mut self, nodename: &str) -> bool {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => current_node == nodename,
> +            ResourcePlacement::Moving {
> +                current_node,
> +                target_node,
> +            } => {
> +                if current_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: target_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                } else if target_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: current_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                }
> +
> +                false
> +            }
> +        }
> +    }

[snip]

> +    /// Add a node to the cluster usage.
> +    ///
> +    /// This method fails if a node with the same `nodename` already exists.
> +    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
> +        if self.nodes.contains_key(&nodename) {
> +            bail!("node '{}' already exists", nodename);

nit: 

               bail!("node '{nodename}' already exists");

> +        }

[snip]

we are reading only, consider using a slice for `nodenames` here (just
like for `remove_resource_from_nodes`):

       fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {

pls find the related changes [0] and [1].

> +    fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
> +        if nodenames
> +            .iter()
> +            .any(|nodename| !self.nodes.contains_key(*nodename))
> +        {
> +            bail!("resource nodes do not exist");
> +        }
> +
> +        nodenames.iter().for_each(|nodename| {
> +            if let Some(node) = self.nodes.get_mut(*nodename) {
> +                node.add_resource(sid);
> +            }
> +        });
> +
> +        Ok(())
> +    }

[snip]

> +    /// Add `resource` with identifier `sid` to cluster usage.
> +    ///
> +    /// This method fails if a resource with the same `sid` already exists or the resource's nodes
> +    /// do not exist in the cluster usage.
> +    pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
> +        if self.resources.contains_key(&sid) {
> +            bail!("resource '{}' already exists", sid);
> +        }
> +
> +        self.add_resource_to_nodes(&sid, resource.nodenames())?;

[0]:

           self.add_resource_to_nodes(&sid, &resource.nodenames())?;

> +
> +        self.resources.insert(sid.to_string(), resource);

nit: to_owned() instead of of to_string() represents the intention best

[snip]

> +    pub fn add_resource_usage_to_node(
> +        &mut self,
> +        nodename: &str,
> +        sid: &str,
> +        stats: ResourceStats,
> +    ) -> Result<(), Error> {
> +        if let Some(resource) = self.resources.get_mut(sid) {
> +            resource.moving_to(nodename.to_string())?;
> +
> +            self.add_resource_to_nodes(sid, vec![nodename])

[1]:

               self.add_resource_to_nodes(sid, &[nodename])

[snip]

> +#[test]
> +fn test_no_duplicate_nodes() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    match usage.add_node("node1".to_string(), NodeStats::default()) {
> +        Ok(_) => bail!("cluster usage does allow duplicate node entries"),
> +        Err(_) => Ok(()),
> +    }

since this is supposed to be a test case, I would rather assert instead
of bail, using:

    assert!(
        usage
            .add_node("node1".to_string(), NodeStats::default())
            .is_err(),
        "cluster usage allows duplicate node entries"
    );

> +}
> +
> +#[test]
> +fn test_no_duplicate_resources() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: "node1".to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource("vm:101".to_string(), resource.clone())?;
> +
> +    match usage.add_resource("vm:101".to_string(), resource) {
> +        Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
> +        Err(_) => Ok(()),
> +    }

assert instead of bail:

       assert!(
           usage.add_resource("vm:101".to_string(), resource).is_err(),
           "cluster usage allows duplicate resource entries"
       );

> +}
> +
> +#[test]
> +#[allow(deprecated)]
> +fn test_add_resource_usage_to_node() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +    usage.add_node("node2".to_string(), NodeStats::default())?;
> +    usage.add_node("node3".to_string(), NodeStats::default())?;
> +
> +    usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
> +    usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
> +
> +    if usage
> +        .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
> +        .is_ok()
> +    {
> +        bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
> +    }

assert instead of bail:

       assert!(
           usage
               .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
               .is_err(),
           "add_resource_usage_to_node() allows adding resource to more than two nodes"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_stationary_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, nodename) = ("vm:101", "node1");
> +
> +    usage.add_node(nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (Some(_), Some(node)) => {
> +            if !node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not added to node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_some(),
           "resource '{sid}' was not added"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       node.contains_resource(sid),
                       "resource '{sid}' was not added to node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    usage.remove_resource(sid);
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (None, Some(node)) => {
> +            if node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_none(),
           "resource '{sid}' was not removed"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       !node.contains_resource(sid),
                       "resource '{sid}' was not removed from node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_moving_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
> +
> +    usage.add_node(current_nodename.to_string(), NodeStats::default())?;
> +    usage.add_node(target_nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Moving {
> +        current_node: current_nodename.to_string(),
> +        target_node: target_nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +

analogously, here I'd find asserting more appropriate than bailing

> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (Some(_), Some(current_node), Some(target_node)) => {
> +            if !current_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to current node '{current_nodename}'");
> +            }
> +
> +            if !target_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or nodes were not added"),
> +    }
> +
> +    usage.remove_resource(sid);

analogously, here I'd find asserting more appropriate than bailing

> +
> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (None, Some(current_node), Some(target_node)) => {
> +            if current_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
> +            }
> +
> +            if target_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }
> +
> +    Ok(())
> +}





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
  2026-03-26 10:28   ` Dominik Rusovac
@ 2026-03-26 14:15     ` Daniel Kral
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:15 UTC (permalink / raw)
  To: Dominik Rusovac, pve-devel

On Thu Mar 26, 2026 at 11:28 AM CET, Dominik Rusovac wrote:
> lgtm
>
> pls find my comments inline, mostly relating to nits or tiny things
>
> On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
>> This is a more generic version of the `Usage` implementation from the
>> pve_static bindings in the pve_rs repository.
>>
>> As the upcoming load balancing scheduler actions and dynamic resource
>> scheduler will need more information about each resource, this further
>> improves on the state tracking of each resource:
>>
>> In this implementation, a resource is composed of its usage statistics
>> and its two essential states: the running state and the node placement.
>> The non_exhaustive attribute ensures that usages need to construct the
>> a Resource instance through its API.
>>
>> Users can repeatedly use the current state of Usage to make scheduling
>> decisions with the to_scheduler() method. This method takes an
>> implementation of UsageAggregator, which dictates how the usage
>> information is represented to the Scheduler.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> changes v1 -> v2:
>> - new!
>>
>> This patch is added to move the handling of specific usage stats and
>> their (de)serialization to the pve-rs bindings and have the general
>> functionality in this crate.
>
> [snip]
>
> nit: imo, it's more convenient to expose the more ergonomic `&str` type,
> using:
>
>        pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
>            self.resources.iter().map(String::as_str)
>        }
>

Thanks, will do that!

>> +    pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
>> +        self.resources.iter()
>> +    }
>
> [snip] 
>
>> +    pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
>> +        match &self.placement {
>> +            ResourcePlacement::Stationary { current_node } => {
>> +                self.placement = ResourcePlacement::Moving {
>> +                    current_node: current_node.to_string(),
>
> nit: 
>
>                        current_node: current_node.to_owned(), 
>
> represents the intention best, that is, owning rather than converting
>
> [snip]

Thanks, will do so for this and the rest!


[...]

>> +    /// Add a node to the cluster usage.
>> +    ///
>> +    /// This method fails if a node with the same `nodename` already exists.
>> +    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
>> +        if self.nodes.contains_key(&nodename) {
>> +            bail!("node '{}' already exists", nodename);
>
> nit: 
>
>                bail!("node '{nodename}' already exists");
>

ACK

>> +        }
>
> [snip]
>
> we are reading only, consider using a slice for `nodenames` here (just
> like for `remove_resource_from_nodes`):
>
>        fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {
>
> pls find the related changes [0] and [1].
>

Right, that makes more sense, will go for that!


[...]

>> +#[test]
>> +fn test_no_duplicate_nodes() -> Result<(), Error> {
>> +    let mut usage = Usage::new();
>> +
>> +    usage.add_node("node1".to_string(), NodeStats::default())?;
>> +
>> +    match usage.add_node("node1".to_string(), NodeStats::default()) {
>> +        Ok(_) => bail!("cluster usage does allow duplicate node entries"),
>> +        Err(_) => Ok(()),
>> +    }
>
> since this is supposed to be a test case, I would rather assert instead
> of bail, using:
>
>     assert!(
>         usage
>             .add_node("node1".to_string(), NodeStats::default())
>             .is_err(),
>         "cluster usage allows duplicate node entries"
>     );
>

Right, that's more appropriate, will do so here and for all the
following, thanks!




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (4 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:29   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
                   ` (33 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

Iterator::min_by(...) and Iterator::max_by(...) do only return `None` if
there are no entries in the `Matrix` column at all. This can only happen
if the `Matrix` doesn't have any row entries.

This will make any call to score_alternatives(...), the only current
user of IdealAlternatives::compute(...), panic if there are no given
alternatives. Therefore use reasonable default values.

This has not happened yet, because the only non-test caller of
score_alternatives(...) is score_nodes_to_start_resource(...), which
always has nodes present in production.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

This can happen with the next patch if
score_best_balancing_migration_candidates() is called with an empty
candidates vec, which is trivially possible for pve-ha-manager in a
cluster with high imbalance, but no configured HA resources or all HA
resources being so constrainted that no migration is possible.

 proxmox-resource-scheduling/src/topsis.rs | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/proxmox-resource-scheduling/src/topsis.rs b/proxmox-resource-scheduling/src/topsis.rs
index 6d078aa6..ed5a9bd1 100644
--- a/proxmox-resource-scheduling/src/topsis.rs
+++ b/proxmox-resource-scheduling/src/topsis.rs
@@ -145,8 +145,10 @@ impl<const N: usize> IdealAlternatives<N> {
             let min = fixed_criterion
                 .clone()
                 .min_by(|a, b| a.total_cmp(b))
-                .unwrap();
-            let max = fixed_criterion.max_by(|a, b| a.total_cmp(b)).unwrap();
+                .unwrap_or(f64::NEG_INFINITY);
+            let max = fixed_criterion
+                .max_by(|a, b| a.total_cmp(b))
+                .unwrap_or(f64::INFINITY);
 
             (best[n], worst[n]) = match criteria[n].maximize {
                 true => (max, min),
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics
  2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
@ 2026-03-26 10:29   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:29 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:

[snip]

> diff --git a/proxmox-resource-scheduling/src/topsis.rs b/proxmox-resource-scheduling/src/topsis.rs
> index 6d078aa6..ed5a9bd1 100644
> --- a/proxmox-resource-scheduling/src/topsis.rs
> +++ b/proxmox-resource-scheduling/src/topsis.rs
> @@ -145,8 +145,10 @@ impl<const N: usize> IdealAlternatives<N> {
>              let min = fixed_criterion
>                  .clone()
>                  .min_by(|a, b| a.total_cmp(b))
> -                .unwrap();
> -            let max = fixed_criterion.max_by(|a, b| a.total_cmp(b)).unwrap();
> +                .unwrap_or(f64::NEG_INFINITY);
> +            let max = fixed_criterion
> +                .max_by(|a, b| a.total_cmp(b))
> +                .unwrap_or(f64::INFINITY);

that's a very nice idea!

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (5 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:29   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
                   ` (32 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

Even though comparing by index is slightly faster here, comparing by the
nodename makes factoring this out for an upcoming patch possible.

This should increase runtime only marginally as this is roughly bound by
the 2 * node_count * maximum_hostname_length.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 proxmox-resource-scheduling/src/scheduler.rs | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index bb38f238..47abffb1 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -61,18 +61,17 @@ impl Scheduler {
         let matrix = self
             .nodes
             .iter()
-            .enumerate()
-            .map(|(target_index, _)| {
+            .map(|node| {
                 // Base values on percentages to allow comparing nodes with different stats.
                 let mut highest_cpu = 0.0;
                 let mut squares_cpu = 0.0;
                 let mut highest_mem = 0.0;
                 let mut squares_mem = 0.0;
 
-                for (index, node) in self.nodes.iter().enumerate() {
-                    let mut new_stats = node.stats;
+                for target_node in self.nodes.iter() {
+                    let mut new_stats = target_node.stats;
 
-                    if index == target_index {
+                    if node.name == target_node.name {
                         new_stats.add_started_resource(&resource_stats)
                     };
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource
  2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
@ 2026-03-26 10:29   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:29 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> Even though comparing by index is slightly faster here, comparing by the
> nodename makes factoring this out for an upcoming patch possible.
>
> This should increase runtime only marginally as this is roughly bound by
> the 2 * node_count * maximum_hostname_length.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (6 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:30   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
                   ` (31 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The same calculation will be needed for the scoring of migrations with
the TOPSIS method in the following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 proxmox-resource-scheduling/src/scheduler.rs | 68 ++++++++++++--------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 47abffb1..69dc6f4e 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -43,6 +43,44 @@ impl Scheduler {
         }
     }
 
+    /// Map the current node usages to a [`PveTopsisAlternative`].
+    ///
+    /// The [`PveTopsisAlternative`] is derived by calculating a modified version of the root mean
+    /// square (RMS) and maximum value of each stat in the node usages.
+    fn topsis_alternative_with(
+        &self,
+        map_node_stats: impl Fn(&NodeUsage) -> NodeStats,
+    ) -> PveTopsisAlternative {
+        let len = self.nodes.len();
+
+        // Base values on percentages to allow comparing nodes with different stats.
+        let mut highest_cpu = 0.0;
+        let mut squares_cpu = 0.0;
+        let mut highest_mem = 0.0;
+        let mut squares_mem = 0.0;
+
+        for node in self.nodes.iter() {
+            let new_stats = map_node_stats(node);
+
+            let new_cpu = new_stats.cpu_load();
+            highest_cpu = f64::max(highest_cpu, new_cpu);
+            squares_cpu += new_cpu.powi(2);
+
+            let new_mem = new_stats.mem_load();
+            highest_mem = f64::max(highest_mem, new_mem);
+            squares_mem += new_mem.powi(2);
+        }
+
+        // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+        // 1.004 is only slightly more than 1.002.
+        PveTopsisAlternative {
+            average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+            highest_cpu: 1.0 + highest_cpu,
+            average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+            highest_memory: 1.0 + highest_mem,
+        }
+    }
+
     /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// The scoring is done as if the resource is already started on each node. This assumes that
@@ -55,43 +93,21 @@ impl Scheduler {
         &self,
         resource_stats: T,
     ) -> Result<Vec<(String, f64)>, Error> {
-        let len = self.nodes.len();
         let resource_stats = resource_stats.into();
 
         let matrix = self
             .nodes
             .iter()
             .map(|node| {
-                // Base values on percentages to allow comparing nodes with different stats.
-                let mut highest_cpu = 0.0;
-                let mut squares_cpu = 0.0;
-                let mut highest_mem = 0.0;
-                let mut squares_mem = 0.0;
-
-                for target_node in self.nodes.iter() {
+                self.topsis_alternative_with(|target_node| {
                     let mut new_stats = target_node.stats;
 
                     if node.name == target_node.name {
                         new_stats.add_started_resource(&resource_stats)
-                    };
+                    }
 
-                    let new_cpu = new_stats.cpu_load();
-                    highest_cpu = f64::max(highest_cpu, new_cpu);
-                    squares_cpu += new_cpu.powi(2);
-
-                    let new_mem = new_stats.mem_load();
-                    highest_mem = f64::max(highest_mem, new_mem);
-                    squares_mem += new_mem.powi(2);
-                }
-
-                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-                // 1.004 is only slightly more than 1.002.
-                PveTopsisAlternative {
-                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                    highest_cpu: 1.0 + highest_cpu,
-                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                    highest_memory: 1.0 + highest_mem,
-                }
+                    new_stats
+                })
                 .into()
             })
             .collect::<Vec<_>>();
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping
  2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
@ 2026-03-26 10:30   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:30 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The same calculation will be needed for the scoring of migrations with
> the TOPSIS method in the following patch.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (7 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-26 10:34   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
                   ` (30 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

Assuming that a resource will hold the same dynamic resource usage on a
new node as on the previous node, score possible migrations, where:

- the cluster node imbalance is minimal (bruteforce), or
- the shifted root mean square and maximum resource usages of the cpu
  and memory is minimal across the cluster nodes (TOPSIS).

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add saturating_sub() in remove_running_resource(...) (as suggested by
  @Thomas)
- slightly move declarations and impls around so that reading from
  top-to-bottom is a little easier
- pass NodeUsage vec instead of NodeStats vec to
  calculate_node_imbalance(...)
- pass a closure to calculate_node_imbalance(...) (as suggested by
  @Dominik)
- also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
  struct is now ordered first by the imbalance and then the strings in
  the `Migration` struct
- fix floating-point issue for the imbalance ordering for
  ScoredMigration
- correctly implement `Ord` (essentially removing the reverse() and
  moving these Reverse() wrappers to the usages for the BinaryHeap)
- use the `Migration` struct in `MigrationCandidate` as well
- drop Scheduler::node_stats() as it's unused now
- use Vec::with_capacity(...) where possible
- eagerly implement common traits (especially Clone and Debug)
- add test cases for the ScoredMigration ordering, node imbalance
  calculation and the two rebalancing migration scoring methods
- s/score_best_balancing_migrations
   /score_best_balancing_migration_candidates
  to possibly allow the Scheduler/Usage impls handling the migration
  candidate generation in the future instead of the callers

 proxmox-resource-scheduling/src/node.rs       |  17 ++
 proxmox-resource-scheduling/src/scheduler.rs  | 282 ++++++++++++++++++
 .../tests/scheduler.rs                        | 169 ++++++++++-
 3 files changed, 467 insertions(+), 1 deletion(-)

diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index be462782..2dcef75e 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -29,6 +29,18 @@ impl NodeStats {
         self.mem += resource_stats.maxmem;
     }
 
+    /// Adds the resource stats to the node stats as if the resource is running on the node.
+    pub fn add_running_resource(&mut self, resource_stats: &ResourceStats) {
+        self.cpu += resource_stats.cpu;
+        self.mem += resource_stats.mem;
+    }
+
+    /// Removes the resource stats from the node stats as if the resource is not running on the node.
+    pub fn remove_running_resource(&mut self, resource_stats: &ResourceStats) {
+        self.cpu -= resource_stats.cpu;
+        self.mem = self.mem.saturating_sub(resource_stats.mem);
+    }
+
     /// Returns the current cpu usage as a percentage.
     pub fn cpu_load(&self) -> f64 {
         self.cpu / self.maxcpu as f64
@@ -38,6 +50,11 @@ impl NodeStats {
     pub fn mem_load(&self) -> f64 {
         self.mem as f64 / self.maxmem as f64
     }
+
+    /// Returns a combined node usage as a percentage.
+    pub fn load(&self) -> f64 {
+        (self.cpu_load() + self.mem_load()) / 2.0
+    }
 }
 
 /// A node in the cluster context.
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 69dc6f4e..a25babad 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -2,6 +2,12 @@ use anyhow::Error;
 
 use crate::{node::NodeStats, resource::ResourceStats, topsis};
 
+use serde::{Deserialize, Serialize};
+use std::{
+    cmp::{Ordering, Reverse},
+    collections::BinaryHeap,
+};
+
 /// The scheduler view of a node.
 #[derive(Clone, Debug)]
 pub struct NodeUsage {
@@ -11,6 +17,36 @@ pub struct NodeUsage {
     pub stats: NodeStats,
 }
 
+/// Returns the load imbalance among the nodes.
+///
+/// The load balance is measured as the statistical dispersion of the individual node loads.
+///
+/// The current implementation uses the dimensionless coefficient of variation, which expresses the
+/// standard deviation in relation to the average mean of the node loads.
+///
+/// The coefficient of variation is not robust, which is a desired property here, because outliers
+/// should be detected as much as possible.
+fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
+    let node_count = nodes.len();
+    let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
+
+    let load_sum = node_loads.iter().sum::<f64>();
+
+    // load_sum is guaranteed to be -0.0 for empty `nodes`
+    if load_sum == 0.0 {
+        0.0
+    } else {
+        let load_mean = load_sum / node_count as f64;
+
+        let squared_diff_sum = node_loads
+            .iter()
+            .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
+        let load_sd = (squared_diff_sum / node_count as f64).sqrt();
+
+        load_sd / load_mean
+    }
+}
+
 criteria_struct! {
     /// A given alternative.
     struct PveTopsisAlternative {
@@ -32,6 +68,83 @@ pub struct Scheduler {
     nodes: Vec<NodeUsage>,
 }
 
+/// A possible migration.
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct Migration {
+    /// The identifier of a leading resource.
+    pub sid: String,
+    /// The current node of the leading resource.
+    pub source_node: String,
+    /// The possible migration target node for the resource.
+    pub target_node: String,
+}
+
+/// A possible migration with a score.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct ScoredMigration {
+    /// The possible migration.
+    pub migration: Migration,
+    /// The expected node imbalance after the migration.
+    pub imbalance: f64,
+}
+
+impl Ord for ScoredMigration {
+    fn cmp(&self, other: &Self) -> Ordering {
+        self.imbalance
+            .total_cmp(&other.imbalance)
+            .then(self.migration.cmp(&other.migration))
+    }
+}
+
+impl PartialOrd for ScoredMigration {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl PartialEq for ScoredMigration {
+    fn eq(&self, other: &Self) -> bool {
+        self.cmp(other) == Ordering::Equal
+    }
+}
+
+impl Eq for ScoredMigration {}
+
+impl ScoredMigration {
+    pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
+        // Depending how the imbalance is calculated, it can contain minor approximation errors. As
+        // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
+        // where the imbalance is the same up to the significant digits in base 10, but treated as
+        // different values.
+        //
+        // Therefore, truncate any non-significant digits to prevent these cases.
+        let factor = 10_f64.powf(f64::DIGITS as f64);
+        let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
+
+        Self {
+            migration: migration.into(),
+            imbalance: truncated_imbalance,
+        }
+    }
+}
+
+/// A possible migration candidate with the migrated usage stats.
+#[derive(Clone, Debug)]
+pub struct MigrationCandidate {
+    /// The possible migration.
+    pub migration: Migration,
+    /// The to-be-migrated resource usage stats.
+    pub stats: ResourceStats,
+}
+
+impl From<MigrationCandidate> for Migration {
+    fn from(candidate: MigrationCandidate) -> Self {
+        candidate.migration
+    }
+}
+
 impl Scheduler {
     /// Instantiate scheduler instance from node usages.
     pub fn from_nodes<I>(nodes: I) -> Self
@@ -81,6 +194,123 @@ impl Scheduler {
         }
     }
 
+    /// Returns the load imbalance among the nodes.
+    ///
+    /// See [`calculate_node_imbalance`] for more information.
+    pub fn node_imbalance(&self) -> f64 {
+        calculate_node_imbalance(&self.nodes, |node| node.stats.load())
+    }
+
+    /// Returns the load imbalance among the nodes as if a specific resource was moved.
+    ///
+    /// See [`calculate_node_imbalance`] for more information.
+    fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64 {
+        calculate_node_imbalance(&self.nodes, |node| {
+            let mut new_stats = node.stats;
+
+            if node.name == candidate.migration.source_node {
+                new_stats.remove_running_resource(&candidate.stats);
+            } else if node.name == candidate.migration.target_node {
+                new_stats.add_running_resource(&candidate.stats);
+            }
+
+            new_stats.load()
+        })
+    }
+
+    /// Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+    /// done whether the given nodenames actually exist in the scheduler.
+    ///
+    /// The scoring is done as if each resource migration has already been done. This assumes that
+    /// the already migrated resource consumes the same amount of each stat as on the previous node
+    /// according to its `stats`.
+    ///
+    /// Returns up to `limit` of the best scored migrations.
+    pub fn score_best_balancing_migration_candidates<I>(
+        &self,
+        candidates: I,
+        limit: usize,
+    ) -> Vec<ScoredMigration>
+    where
+        I: IntoIterator<Item = MigrationCandidate>,
+    {
+        let mut scored_migrations = candidates
+            .into_iter()
+            .map(|candidate| {
+                let imbalance = self.node_imbalance_with_migration_candidate(&candidate);
+
+                Reverse(ScoredMigration::new(candidate, imbalance))
+            })
+            .collect::<BinaryHeap<_>>();
+
+        let mut best_migrations = Vec::with_capacity(limit);
+
+        // BinaryHeap::into_iter_sorted() is still in nightly unfortunately
+        while best_migrations.len() < limit {
+            match scored_migrations.pop() {
+                Some(Reverse(alternative)) => best_migrations.push(alternative),
+                None => break,
+            }
+        }
+
+        best_migrations
+    }
+
+    /// Scores the given migration `candidates` by the best node imbalance improvement with the
+    /// TOPSIS method.
+    ///
+    /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+    /// done whether the given nodenames actually exist in the scheduler.
+    ///
+    /// The scoring is done as if each resource migration has already been done. This assumes that
+    /// the already migrated resource consumes the same amount of each stat as on the previous node
+    /// according to its `stats`.
+    ///
+    /// Returns up to `limit` of the best scored migrations.
+    pub fn score_best_balancing_migration_candidates_topsis(
+        &self,
+        candidates: &[MigrationCandidate],
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let matrix = candidates
+            .iter()
+            .map(|candidate| {
+                let resource_stats = &candidate.stats;
+                let source_node = &candidate.migration.source_node;
+                let target_node = &candidate.migration.target_node;
+
+                self.topsis_alternative_with(|node| {
+                    let mut new_stats = node.stats;
+
+                    if &node.name == source_node {
+                        new_stats.remove_running_resource(resource_stats);
+                    } else if &node.name == target_node {
+                        new_stats.add_running_resource(resource_stats);
+                    }
+
+                    new_stats
+                })
+                .into()
+            })
+            .collect::<Vec<_>>();
+
+        let best_alternatives =
+            topsis::rank_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+        Ok(best_alternatives
+            .into_iter()
+            .take(limit)
+            .map(|i| {
+                let imbalance = self.node_imbalance_with_migration_candidate(&candidates[i]);
+
+                ScoredMigration::new(candidates[i].clone(), imbalance)
+            })
+            .collect())
+    }
+
     /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// The scoring is done as if the resource is already started on each node. This assumes that
@@ -122,3 +352,55 @@ impl Scheduler {
             .collect())
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_scored_migration_order() {
+        let migration1 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:102"),
+                source_node: String::from("node1"),
+                target_node: String::from("node2"),
+            },
+            0.7231749488916931,
+        );
+        let migration2 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:102"),
+                source_node: String::from("node1"),
+                target_node: String::from("node3"),
+            },
+            0.723174948891693,
+        );
+        let migration3 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:101"),
+                source_node: String::from("node1"),
+                target_node: String::from("node2"),
+            },
+            0.723174948891693 + 1e-15,
+        );
+
+        let mut migrations = vec![migration2.clone(), migration3.clone(), migration1.clone()];
+
+        migrations.sort();
+
+        assert_eq!(
+            vec![migration1.clone(), migration2.clone(), migration3.clone()],
+            migrations
+        );
+
+        let mut heap = BinaryHeap::from(vec![
+            Reverse(migration2.clone()),
+            Reverse(migration3.clone()),
+            Reverse(migration1.clone()),
+        ]);
+
+        assert_eq!(heap.pop(), Some(Reverse(migration1)));
+        assert_eq!(heap.pop(), Some(Reverse(migration2)));
+        assert_eq!(heap.pop(), Some(Reverse(migration3)));
+    }
+}
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
index c7a9dab9..8672f40d 100644
--- a/proxmox-resource-scheduling/tests/scheduler.rs
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -2,9 +2,13 @@ use anyhow::Error;
 use proxmox_resource_scheduling::{
     node::NodeStats,
     resource::ResourceStats,
-    scheduler::{NodeUsage, Scheduler},
+    scheduler::{Migration, MigrationCandidate, NodeUsage, Scheduler, ScoredMigration},
 };
 
+fn new_empty_cluster_scheduler() -> Scheduler {
+    Scheduler::from_nodes(Vec::<NodeUsage>::new())
+}
+
 fn new_homogeneous_cluster_scheduler() -> Scheduler {
     let (maxcpu, maxmem) = (16, 64 * (1 << 30));
 
@@ -75,6 +79,169 @@ fn new_heterogeneous_cluster_scheduler() -> Scheduler {
     Scheduler::from_nodes(vec![node1, node2, node3])
 }
 
+#[test]
+fn test_node_imbalance_with_empty_cluster() {
+    let scheduler = new_empty_cluster_scheduler();
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+#[test]
+fn test_node_imbalance_with_perfectly_balanced_cluster() {
+    let node = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu: 16,
+            mem: 224395264,
+            maxmem: 68719476736,
+        },
+    };
+
+    let scheduler = Scheduler::from_nodes(vec![node.clone()]);
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+
+    let scheduler = Scheduler::from_nodes(vec![node.clone(), node.clone(), node]);
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+fn new_simple_migration_candidates() -> (Vec<MigrationCandidate>, Migration, Migration) {
+    let migration1 = Migration {
+        sid: String::from("vm:101"),
+        source_node: String::from("node1"),
+        target_node: String::from("node2"),
+    };
+    let migration2 = Migration {
+        sid: String::from("vm:101"),
+        source_node: String::from("node1"),
+        target_node: String::from("node3"),
+    };
+    let stats = ResourceStats {
+        cpu: 0.7,
+        maxcpu: 4.0,
+        mem: 8 << 30,
+        maxmem: 16 << 30,
+    };
+
+    let candidates = vec![
+        MigrationCandidate {
+            migration: migration1.clone(),
+            stats,
+        },
+        MigrationCandidate {
+            migration: migration2.clone(),
+            stats,
+        },
+    ];
+
+    (candidates, migration1, migration2)
+}
+
+fn assert_imbalance(imbalance: f64, expected_imbalance: f64) {
+    assert!(
+        (expected_imbalance - imbalance).abs() <= f64::EPSILON,
+        "imbalance is {imbalance}, but was expected to be {expected_imbalance}"
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_with_no_candidates() {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates(vec![], 2),
+        vec![]
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_homogeneous_cluster() {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates(candidates, 2),
+        vec![
+            ScoredMigration::new(migration2.clone(), 0.5972874658664057),
+            ScoredMigration::new(migration1.clone(), 0.7239828690397611)
+        ]
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_heterogeneous_cluster() {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates(candidates, 2),
+        vec![
+            ScoredMigration::new(migration2, 0.525031850557711),
+            ScoredMigration::new(migration1, 0.5794177040605537)
+        ]
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_with_no_candidates() -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates_topsis(&vec![], 2)?,
+        vec![]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_homogeneous_cluster(
+) -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates_topsis(&candidates, 2)?,
+        vec![
+            ScoredMigration::new(migration1.clone(), 0.7239828690397611),
+            ScoredMigration::new(migration2.clone(), 0.5972874658664057),
+        ]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_heterogeneous_cluster(
+) -> Result<(), Error> {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        scheduler.score_best_balancing_migration_candidates_topsis(&candidates, 2)?,
+        vec![
+            ScoredMigration::new(migration1, 0.5794177040605537),
+            ScoredMigration::new(migration2, 0.525031850557711),
+        ]
+    );
+
+    Ok(())
+}
+
 fn rank_nodes_to_start_resource(
     scheduler: &Scheduler,
     resource_stats: ResourceStats,
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-26 10:34   ` Dominik Rusovac
  2026-03-26 14:11     ` Daniel Kral
  0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:34 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

pls find my comments inline, mostly relating to nits or tiny things

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> Assuming that a resource will hold the same dynamic resource usage on a
> new node as on the previous node, score possible migrations, where:
>
> - the cluster node imbalance is minimal (bruteforce), or
> - the shifted root mean square and maximum resource usages of the cpu
>   and memory is minimal across the cluster nodes (TOPSIS).
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - add saturating_sub() in remove_running_resource(...) (as suggested by
>   @Thomas)
> - slightly move declarations and impls around so that reading from
>   top-to-bottom is a little easier
> - pass NodeUsage vec instead of NodeStats vec to
>   calculate_node_imbalance(...)
> - pass a closure to calculate_node_imbalance(...) (as suggested by
>   @Dominik)
> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
>   struct is now ordered first by the imbalance and then the strings in
>   the `Migration` struct
> - fix floating-point issue for the imbalance ordering for
>   ScoredMigration
> - correctly implement `Ord` (essentially removing the reverse() and
>   moving these Reverse() wrappers to the usages for the BinaryHeap)
> - use the `Migration` struct in `MigrationCandidate` as well
> - drop Scheduler::node_stats() as it's unused now
> - use Vec::with_capacity(...) where possible
> - eagerly implement common traits (especially Clone and Debug)
> - add test cases for the ScoredMigration ordering, node imbalance
>   calculation and the two rebalancing migration scoring methods
> - s/score_best_balancing_migrations
>    /score_best_balancing_migration_candidates
>   to possibly allow the Scheduler/Usage impls handling the migration
>   candidate generation in the future instead of the callers

[snip]

> +/// Returns the load imbalance among the nodes.
> +///
> +/// The load balance is measured as the statistical dispersion of the individual node loads.
> +///
> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
> +/// standard deviation in relation to the average mean of the node loads.
> +///
> +/// The coefficient of variation is not robust, which is a desired property here, because outliers
> +/// should be detected as much as possible.
> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {

very nice docs! 

[snip]

> +/// A possible migration.
> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
> +#[serde(rename_all = "kebab-case")]
> +pub struct Migration {
> +    /// The identifier of a leading resource.
> +    pub sid: String,
> +    /// The current node of the leading resource.
> +    pub source_node: String,
> +    /// The possible migration target node for the resource.
> +    pub target_node: String,

nit: on the long run, instead of having `ScoredMigration`, 
it could be more convenient to have a field:

    pub imbalance: Option<f64>,

> +}

[snip]

> +impl ScoredMigration {
> +    pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
> +        // Depending how the imbalance is calculated, it can contain minor approximation errors. As

           // Depending [on] how [...]

> +        // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
> +        // where the imbalance is the same up to the significant digits in base 10, but treated as
> +        // different values.
> +        //
> +        // Therefore, truncate any non-significant digits to prevent these cases.
> +        let factor = 10_f64.powf(f64::DIGITS as f64);
> +        let truncated_imbalance = f64::trunc(factor * imbalance) / factor;

Nice solution, this appears to be a clean approach to achieve deterministic `Ord`
for `f64`. 

One small thing, tho: `f64::DIGITS` is technically not a floating number, but `15_u32`.

           let factor = 10_f64.powi(f64::DIGITS as i32);

thus, seems to be the better choice here. `powi` is also generally faster than `powf` [0]. 

[0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=Using%20this%20function%20is%20generally%20faster%20than%20using%20powf

[snip]

> +/// A possible migration candidate with the migrated usage stats.
> +#[derive(Clone, Debug)]
> +pub struct MigrationCandidate {
> +    /// The possible migration.
> +    pub migration: Migration,
> +    /// The to-be-migrated resource usage stats.

imo, easier to comprehend: 

    /// Usage stats of the resource to be migrated

> +    pub stats: ResourceStats,
> +}

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-26 10:34   ` Dominik Rusovac
@ 2026-03-26 14:11     ` Daniel Kral
  2026-03-27  9:34       ` Dominik Rusovac
  0 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:11 UTC (permalink / raw)
  To: Dominik Rusovac, pve-devel

On Thu Mar 26, 2026 at 11:34 AM CET, Dominik Rusovac wrote:
> lgtm 
>
> pls find my comments inline, mostly relating to nits or tiny things
>
> On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
>> Assuming that a resource will hold the same dynamic resource usage on a
>> new node as on the previous node, score possible migrations, where:
>>
>> - the cluster node imbalance is minimal (bruteforce), or
>> - the shifted root mean square and maximum resource usages of the cpu
>>   and memory is minimal across the cluster nodes (TOPSIS).
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> changes v1 -> v2:
>> - add saturating_sub() in remove_running_resource(...) (as suggested by
>>   @Thomas)
>> - slightly move declarations and impls around so that reading from
>>   top-to-bottom is a little easier
>> - pass NodeUsage vec instead of NodeStats vec to
>>   calculate_node_imbalance(...)
>> - pass a closure to calculate_node_imbalance(...) (as suggested by
>>   @Dominik)
>> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
>>   struct is now ordered first by the imbalance and then the strings in
>>   the `Migration` struct
>> - fix floating-point issue for the imbalance ordering for
>>   ScoredMigration
>> - correctly implement `Ord` (essentially removing the reverse() and
>>   moving these Reverse() wrappers to the usages for the BinaryHeap)
>> - use the `Migration` struct in `MigrationCandidate` as well
>> - drop Scheduler::node_stats() as it's unused now
>> - use Vec::with_capacity(...) where possible
>> - eagerly implement common traits (especially Clone and Debug)
>> - add test cases for the ScoredMigration ordering, node imbalance
>>   calculation and the two rebalancing migration scoring methods
>> - s/score_best_balancing_migrations
>>    /score_best_balancing_migration_candidates
>>   to possibly allow the Scheduler/Usage impls handling the migration
>>   candidate generation in the future instead of the callers
>
> [snip]
>
>> +/// Returns the load imbalance among the nodes.
>> +///
>> +/// The load balance is measured as the statistical dispersion of the individual node loads.
>> +///
>> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
>> +/// standard deviation in relation to the average mean of the node loads.
>> +///
>> +/// The coefficient of variation is not robust, which is a desired property here, because outliers
>> +/// should be detected as much as possible.
>> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
>
> very nice docs! 
>
> [snip]
>
>> +/// A possible migration.
>> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
>> +#[serde(rename_all = "kebab-case")]
>> +pub struct Migration {
>> +    /// The identifier of a leading resource.
>> +    pub sid: String,
>> +    /// The current node of the leading resource.
>> +    pub source_node: String,
>> +    /// The possible migration target node for the resource.
>> +    pub target_node: String,
>
> nit: on the long run, instead of having `ScoredMigration`, 
> it could be more convenient to have a field:
>
>     pub imbalance: Option<f64>,
>

Might make sense, but then we can't reuse the same structure in
`MigrationCandidate` anymore, so I would let ScoredMigration be it's own
type, what do you think?

>> +}
>
> [snip]
>
>> +impl ScoredMigration {
>> +    pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
>> +        // Depending how the imbalance is calculated, it can contain minor approximation errors. As
>
>            // Depending [on] how [...]

Thanks, will change that!

>
>> +        // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
>> +        // where the imbalance is the same up to the significant digits in base 10, but treated as
>> +        // different values.
>> +        //
>> +        // Therefore, truncate any non-significant digits to prevent these cases.
>> +        let factor = 10_f64.powf(f64::DIGITS as f64);
>> +        let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
>
> Nice solution, this appears to be a clean approach to achieve deterministic `Ord`
> for `f64`. 
>
> One small thing, tho: `f64::DIGITS` is technically not a floating number, but `15_u32`.
>
>            let factor = 10_f64.powi(f64::DIGITS as i32);
>
> thus, seems to be the better choice here. `powi` is also generally faster than `powf` [0]. 
>
> [0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=Using%20this%20function%20is%20generally%20faster%20than%20using%20powf

Thanks, good catch! I also weren't aware of them being non-deterministic
here, that's good to know.

We briefly also talked about this off-list, I'll adapt the test cases
below to not eq() the ScoredMigration directly as the truncation here
might be non-deterministic (and I assumed otherwise).

More-so, it's more important to verify the order in which the migrations
are scored than their exact imbalance score as you suggested.

>
> [snip]
>
>> +/// A possible migration candidate with the migrated usage stats.
>> +#[derive(Clone, Debug)]
>> +pub struct MigrationCandidate {
>> +    /// The possible migration.
>> +    pub migration: Migration,
>> +    /// The to-be-migrated resource usage stats.
>
> imo, easier to comprehend: 
>
>     /// Usage stats of the resource to be migrated

Nice, will change it to that!

>
>> +    pub stats: ResourceStats,
>> +}
>
> [snip]





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-26 14:11     ` Daniel Kral
@ 2026-03-27  9:34       ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27  9:34 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

On Thu Mar 26, 2026 at 3:11 PM CET, Daniel Kral wrote:

[snip]

>>> +/// A possible migration.
>>> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
>>> +#[serde(rename_all = "kebab-case")]
>>> +pub struct Migration {
>>> +    /// The identifier of a leading resource.
>>> +    pub sid: String,
>>> +    /// The current node of the leading resource.
>>> +    pub source_node: String,
>>> +    /// The possible migration target node for the resource.
>>> +    pub target_node: String,
>>
>> nit: on the long run, instead of having `ScoredMigration`, 
>> it could be more convenient to have a field:
>>
>>     pub imbalance: Option<f64>,
>>
>
> Might make sense, but then we can't reuse the same structure in
> `MigrationCandidate` anymore, so I would let ScoredMigration be it's own
> type, what do you think?
>

ok, then let's keep it as-is

[snip]




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (8 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27  9:38   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
                   ` (29 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
service_nodes entries from being cleaned up correctly.

While technically a API break, removing the error does not change any
callers, which do not handle the error anyway. Additionally,
remove_node(...) is only used in testing code in this package and
pve-ha-manager, but is currently unused for production code.

This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 5b91d36..6e57b9d 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -75,25 +75,16 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Remove a node from the scheduler.
     #[export]
-    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> Result<(), Error> {
+    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
         let mut usage = this.inner.lock().unwrap();
 
         if let Some(node) = usage.nodes.remove(nodename) {
             for (sid, _) in node.services.iter() {
-                match usage.service_nodes.get_mut(sid) {
-                    Some(service_nodes) => {
-                        service_nodes.remove(nodename);
-                    }
-                    None => bail!(
-                        "service '{}' not present in service_nodes hashmap while removing node '{}'",
-                        sid,
-                        nodename
-                    ),
+                if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
+                    service_nodes.remove(nodename);
                 }
             }
         }
-
-        Ok(())
     }
 
     /// Method: Get a list of all the nodes in the scheduler.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
  2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
@ 2026-03-27  9:38   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27  9:38 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The error can only happen due to an error in
> add_service_usage_to_node(...), but prevents all the following
> service_nodes entries from being cleaned up correctly.
>
> While technically a API break, removing the error does not change any
> callers, which do not handle the error anyway. Additionally,
> remove_node(...) is only used in testing code in this package and
> pve-ha-manager, but is currently unused for production code.
>
> This change makes the implementation more consistent with the new
> proxmox_resource_scheduling::usage::Usage, which will replace this in
> a following patch.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (9 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27  9:39   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
                   ` (28 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
node services entries from being cleaned up correctly.

While technically a API break, removing the error does not change any
callers.

This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 6e57b9d..b8eac57 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -145,25 +145,16 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
     #[export]
-    fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) -> Result<(), Error> {
+    fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
         let mut usage = this.inner.lock().unwrap();
 
         if let Some(nodes) = usage.service_nodes.remove(sid) {
             for nodename in &nodes {
-                match usage.nodes.get_mut(nodename) {
-                    Some(node) => {
-                        node.services.remove(sid);
-                    }
-                    None => bail!(
-                        "service '{}' not present in usage hashmap on node '{}'",
-                        sid,
-                        nodename
-                    ),
+                if let Some(node) = usage.nodes.get_mut(nodename) {
+                    node.services.remove(sid);
                 }
             }
         }
-
-        Ok(())
     }
 
     /// Scores all previously added nodes for starting a `service` on.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
  2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
@ 2026-03-27  9:39   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27  9:39 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The error can only happen due to an error in
> add_service_usage_to_node(...), but prevents all the following
> node services entries from being cleaned up correctly.
>
> While technically a API break, removing the error does not change any
> callers.
>
> This change makes the implementation more consistent with the new
> proxmox_resource_scheduling::usage::Usage, which will replace this in
> a following patch.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (10 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27  9:41   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
                   ` (27 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

This is in preparation to add the upcoming pve_dynamic bindings, which
shares much of the same code paths as the pve_static implementation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- move it in front of other changes done to pve_static so the code that
  is shared with the upcoming pve_dynamic can already be put in separate
  modules (as suggested by @Thomas)
- add more context and motivation to patch message

 pve-rs/src/bindings/mod.rs                                    | 3 +--
 pve-rs/src/bindings/resource_scheduling/mod.rs                | 4 ++++
 .../pve_static.rs}                                            | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/mod.rs
 rename pve-rs/src/bindings/{resource_scheduling_static.rs => resource_scheduling/pve_static.rs} (98%)

diff --git a/pve-rs/src/bindings/mod.rs b/pve-rs/src/bindings/mod.rs
index c21b328..853a3dd 100644
--- a/pve-rs/src/bindings/mod.rs
+++ b/pve-rs/src/bindings/mod.rs
@@ -3,8 +3,7 @@
 mod oci;
 pub use oci::pve_rs_oci;
 
-mod resource_scheduling_static;
-pub use resource_scheduling_static::pve_rs_resource_scheduling_static;
+pub mod resource_scheduling;
 
 mod tfa;
 pub use tfa::pve_rs_tfa;
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
new file mode 100644
index 0000000..af1fb6b
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -0,0 +1,4 @@
+//! Resource scheduling related bindings.
+
+mod pve_static;
+pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
similarity index 98%
rename from pve-rs/src/bindings/resource_scheduling_static.rs
rename to pve-rs/src/bindings/resource_scheduling/pve_static.rs
index b8eac57..a83a9ab 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -2,7 +2,7 @@
 pub mod pve_rs_resource_scheduling_static {
     //! The `PVE::RS::ResourceScheduling::Static` package.
     //!
-    //! Provides bindings for the resource scheduling module.
+    //! Provides bindings for the static resource scheduling module.
     //!
     //! See [`proxmox_resource_scheduling`].
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
  2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
@ 2026-03-27  9:41   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27  9:41 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm 

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is in preparation to add the upcoming pve_dynamic bindings, which
> shares much of the same code paths as the pve_static implementation.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - move it in front of other changes done to pve_static so the code that
>   is shared with the upcoming pve_dynamic can already be put in separate
>   modules (as suggested by @Thomas)
> - add more context and motivation to patch message

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (11 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27 14:13   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
                   ` (26 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The proxmox_resource_scheduling crate provides a generic usage
implementation, which is backwards compatible with the pve_static
bindings. This reduces the static resource scheduling bindings to a
slightly thinner wrapper.

This also exposes the new `add_resource(...)` binding, which allows
callers to add services with additional state other than the usage
stats. It is exposed as `add_service(...)` to be consistent with the
naming of the rest of the existing methods.

Where it is sensible for the bindings, the documentation is extended
with a link to the documentation of the underlying methods.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add patch message for context
- change from only creating the
  proxmox_resource_scheduling::scheduler::ClusterUsage (now,
  proxmox_resource_scheduling::scheduler::Scheduler), to using the new
  but backwards-compatible `Usage` implementation instead
- this essentially also squashes the 'store services stats independently
  of node' patch in here as this is also tracked by the generic `Usage`
  impl
- add `usage` and `resource` crate for shared code

 .../src/bindings/resource_scheduling/mod.rs   |   3 +
 .../resource_scheduling/pve_static.rs         | 152 ++++++------------
 .../bindings/resource_scheduling/resource.rs  |  44 +++++
 .../src/bindings/resource_scheduling/usage.rs |  33 ++++
 4 files changed, 132 insertions(+), 100 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/resource.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/usage.rs

diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index af1fb6b..9ce631c 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -1,4 +1,7 @@
 //! Resource scheduling related bindings.
 
+mod resource;
+mod usage;
+
 mod pve_static;
 pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index a83a9ab..3d9f142 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -6,40 +6,34 @@ pub mod pve_rs_resource_scheduling_static {
     //!
     //! See [`proxmox_resource_scheduling`].
 
-    use std::collections::{HashMap, HashSet};
     use std::sync::Mutex;
 
-    use anyhow::{Error, bail};
+    use anyhow::Error;
 
     use perlmod::Value;
-    use proxmox_resource_scheduling::pve_static::{StaticNodeUsage, StaticServiceUsage};
+    use proxmox_resource_scheduling::node::NodeStats;
+    use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+    use proxmox_resource_scheduling::usage::Usage;
+
+    use crate::bindings::resource_scheduling::{
+        resource::PveResource, usage::StartedResourceAggregator,
+    };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
 
-    struct StaticNodeInfo {
-        name: String,
-        maxcpu: usize,
-        maxmem: usize,
-        services: HashMap<String, StaticServiceUsage>,
-    }
-
-    struct Usage {
-        nodes: HashMap<String, StaticNodeInfo>,
-        service_nodes: HashMap<String, HashSet<String>>,
-    }
-
-    /// A scheduler instance contains the resource usage by node.
+    /// A scheduler instance contains the cluster usage.
     pub struct Scheduler {
         inner: Mutex<Usage>,
     }
 
+    type StaticResource = PveResource<StaticServiceUsage>;
+
     /// Class method: Create a new [`Scheduler`] instance.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::new`].
     #[export(raw_return)]
     pub fn new(#[raw] class: Value) -> Result<Value, Error> {
-        let inner = Usage {
-            nodes: HashMap::new(),
-            service_nodes: HashMap::new(),
-        };
+        let inner = Usage::new();
 
         Ok(perlmod::instantiate_magic!(
             &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
@@ -48,7 +42,7 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Add a node with its basic CPU and memory info.
     ///
-    /// This inserts a [`StaticNodeInfo`] entry for the node into the scheduler instance.
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
     #[export]
     pub fn add_node(
         #[try_from_ref] this: &Scheduler,
@@ -58,33 +52,24 @@ pub mod pve_rs_resource_scheduling_static {
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
-        if usage.nodes.contains_key(&nodename) {
-            bail!("node {} already added", nodename);
-        }
-
-        let node = StaticNodeInfo {
-            name: nodename.clone(),
+        let stats = NodeStats {
+            cpu: 0.0,
             maxcpu,
+            mem: 0,
             maxmem,
-            services: HashMap::new(),
         };
 
-        usage.nodes.insert(nodename, node);
-        Ok(())
+        usage.add_node(nodename, stats)
     }
 
     /// Method: Remove a node from the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
     #[export]
     pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
         let mut usage = this.inner.lock().unwrap();
 
-        if let Some(node) = usage.nodes.remove(nodename) {
-            for (sid, _) in node.services.iter() {
-                if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
-                    service_nodes.remove(nodename);
-                }
-            }
-        }
+        usage.remove_node(nodename);
     }
 
     /// Method: Get a list of all the nodes in the scheduler.
@@ -93,8 +78,7 @@ pub mod pve_rs_resource_scheduling_static {
         let usage = this.inner.lock().unwrap();
 
         usage
-            .nodes
-            .keys()
+            .nodenames_iter()
             .map(|nodename| nodename.to_string())
             .collect()
     }
@@ -104,10 +88,26 @@ pub mod pve_rs_resource_scheduling_static {
     pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
         let usage = this.inner.lock().unwrap();
 
-        usage.nodes.contains_key(nodename)
+        usage.contains_node(nodename)
+    }
+
+    /// Method: Add `service` with identifier `sid` to the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+    #[export]
+    pub fn add_service(
+        #[try_from_ref] this: &Scheduler,
+        sid: String,
+        service: StaticResource,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_resource(sid, service.try_into()?)
     }
 
     /// Method: Add service `sid` and its `service_usage` to the node.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource_usage_to_node`].
     #[export]
     pub fn add_service_usage_to_node(
         #[try_from_ref] this: &Scheduler,
@@ -117,81 +117,33 @@ pub mod pve_rs_resource_scheduling_static {
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
-        match usage.nodes.get_mut(nodename) {
-            Some(node) => {
-                if node.services.contains_key(sid) {
-                    bail!("service '{}' already added to node '{}'", sid, nodename);
-                }
-
-                node.services.insert(sid.to_string(), service_usage);
-            }
-            None => bail!("node '{}' not present in usage hashmap", nodename),
-        }
-
-        if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
-            if service_nodes.contains(nodename) {
-                bail!("node '{}' already added to service '{}'", nodename, sid);
-            }
-
-            service_nodes.insert(nodename.to_string());
-        } else {
-            let mut service_nodes = HashSet::new();
-            service_nodes.insert(nodename.to_string());
-            usage.service_nodes.insert(sid.to_string(), service_nodes);
-        }
-
-        Ok(())
+        // TODO Only for backwards compatibility, can be removed with a proper version bump
+        #[allow(deprecated)]
+        usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
     }
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
     #[export]
     fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
         let mut usage = this.inner.lock().unwrap();
 
-        if let Some(nodes) = usage.service_nodes.remove(sid) {
-            for nodename in &nodes {
-                if let Some(node) = usage.nodes.get_mut(nodename) {
-                    node.services.remove(sid);
-                }
-            }
-        }
+        usage.remove_resource(sid);
     }
 
-    /// Scores all previously added nodes for starting a `service` on.
+    /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
     ///
-    /// Scoring is done according to the static memory and CPU usages of the nodes as if the
-    /// service would already be running on each.
-    ///
-    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
-    /// score is better.
-    ///
-    /// See [`proxmox_resource_scheduling::pve_static::score_nodes_to_start_service`].
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
     #[export]
     pub fn score_nodes_to_start_service(
         #[try_from_ref] this: &Scheduler,
-        service: StaticServiceUsage,
+        service_stats: StaticServiceUsage,
     ) -> Result<Vec<(String, f64)>, Error> {
         let usage = this.inner.lock().unwrap();
-        let nodes = usage
-            .nodes
-            .values()
-            .map(|node| {
-                let mut node_usage = StaticNodeUsage {
-                    name: node.name.to_string(),
-                    cpu: 0.0,
-                    maxcpu: node.maxcpu,
-                    mem: 0,
-                    maxmem: node.maxmem,
-                };
 
-                for service in node.services.values() {
-                    node_usage.add_service_usage(service);
-                }
-
-                node_usage
-            })
-            .collect::<Vec<StaticNodeUsage>>();
-
-        proxmox_resource_scheduling::pve_static::score_nodes_to_start_service(&nodes, &service)
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_nodes_to_start_resource(service_stats)
     }
 }
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
new file mode 100644
index 0000000..91d56b9
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -0,0 +1,44 @@
+use anyhow::{Error, bail};
+use proxmox_resource_scheduling::resource::{
+    Resource, ResourcePlacement, ResourceState, ResourceStats,
+};
+
+use serde::{Deserialize, Serialize};
+
+/// A PVE resource.
+#[derive(Serialize, Deserialize)]
+pub struct PveResource<T: Into<ResourceStats>> {
+    /// The resource's usage statistics.
+    stats: T,
+    /// Whether the resource is running.
+    running: bool,
+    /// The resource's current node.
+    current_node: Option<String>,
+    /// The resource's optional migration target node.
+    target_node: Option<String>,
+}
+
+impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
+    type Error = Error;
+
+    fn try_from(resource: PveResource<T>) -> Result<Self, Error> {
+        let state = if resource.running {
+            ResourceState::Started
+        } else {
+            ResourceState::Starting
+        };
+
+        let placement = match (resource.current_node, resource.target_node) {
+            (Some(current_node), Some(target_node)) => ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            },
+            (Some(current_node), None) | (None, Some(current_node)) => {
+                ResourcePlacement::Stationary { current_node }
+            }
+            _ => bail!("neither current_node nor target_node are set"),
+        };
+
+        Ok(Resource::new(resource.stats.into(), state, placement))
+    }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
new file mode 100644
index 0000000..fc8b872
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -0,0 +1,33 @@
+use proxmox_resource_scheduling::{
+    scheduler::NodeUsage,
+    usage::{Usage, UsageAggregator},
+};
+
+/// An aggregator, which adds any resource as a started resource.
+///
+/// This aggregator is useful if the node base stats do not have any current usage.
+pub(crate) struct StartedResourceAggregator;
+
+impl UsageAggregator for StartedResourceAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| {
+                let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
+                    let mut node_stats = node_stats;
+
+                    if let Some(resource) = usage.get_resource(sid) {
+                        node_stats.add_started_resource(&resource.stats());
+                    }
+
+                    node_stats
+                });
+
+                NodeUsage {
+                    name: nodename.to_string(),
+                    stats,
+                }
+            })
+            .collect()
+    }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation
  2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-27 14:13   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:13 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider modulo nits

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The proxmox_resource_scheduling crate provides a generic usage
> implementation, which is backwards compatible with the pve_static
> bindings. This reduces the static resource scheduling bindings to a
> slightly thinner wrapper.

good measure, to make proxmox-resource-scheduling handle usage

>
> This also exposes the new `add_resource(...)` binding, which allows
> callers to add services with additional state other than the usage
> stats. It is exposed as `add_service(...)` to be consistent with the
> naming of the rest of the existing methods.
>
> Where it is sensible for the bindings, the documentation is extended
> with a link to the documentation of the underlying methods.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - add patch message for context
> - change from only creating the
>   proxmox_resource_scheduling::scheduler::ClusterUsage (now,
>   proxmox_resource_scheduling::scheduler::Scheduler), to using the new
>   but backwards-compatible `Usage` implementation instead
> - this essentially also squashes the 'store services stats independently
>   of node' patch in here as this is also tracked by the generic `Usage`
>   impl
> - add `usage` and `resource` crate for shared code

[snip]

> +
> +impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
> +    type Error = Error;
> +
> +    fn try_from(resource: PveResource<T>) -> Result<Self, Error> {
> +        let state = if resource.running {
> +            ResourceState::Started
> +        } else {
> +            ResourceState::Starting
> +        };
> +
> +        let placement = match (resource.current_node, resource.target_node) {

as it came up off-list, we might not only prohibit equivalence of 
current_node and target_node in proxmox-resource-scheduling, but 
also here

> +            (Some(current_node), Some(target_node)) => ResourcePlacement::Moving {
> +                current_node,
> +                target_node,
> +            },
> +            (Some(current_node), None) | (None, Some(current_node)) => {

it would be good to have a comment (// NOTE: ...) explaining as to why
this arm's code

> +                ResourcePlacement::Stationary { current_node }
> +            }
> +            _ => bail!("neither current_node nor target_node are set"),
> +        };
> +
> +        Ok(Resource::new(resource.stats.into(), state, placement))
> +    }
> +}

[snip]

> +impl UsageAggregator for StartedResourceAggregator {
> +    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
> +        usage
> +            .nodes_iter()
> +            .map(|(nodename, node)| {

nice fold!

nit: by making `node_stats` mutable in the first place, variable
     shadowing can be avoided, see:

    let stats = node.resources_iter().fold(node.stats(), |mut node_stats, sid| {
    
        if let Some(resource) = usage.get_resource(sid) {
            node_stats.add_started_resource(&resource.stats());
        }
    
        node_stats
    });

> +                let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
> +                    let mut node_stats = node_stats;
> +
> +                    if let Some(resource) = usage.get_resource(sid) {
> +                        node_stats.add_started_resource(&resource.stats());
> +                    }
> +
> +                    node_stats
> +                });
> +
> +                NodeUsage {
> +                    name: nodename.to_string(),
> +                    stats,
> +                }
> +            })
> +            .collect()
> +    }
> +}

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (12 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27 14:18   ` Dominik Rusovac
  2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
                   ` (25 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The StaticServiceUsage is marked as deprecated in
proxmox-resource-scheduling now to make the crate independent of the
specific usage structs and the deserialization of these.

Therefore, define the same struct in the pve_static bindings module.

Though this is technically a Rust API break, the Perl bindings do not
have the concept of structs, which are serialized as Perl hashes.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

Move towards handling (de)serialization in pve-rs and having the generic
impls in the proxmox-resource-scheduling crate.

 .../resource_scheduling/pve_static.rs         | 32 ++++++++++++++++---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index 3d9f142..e2756db 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -9,10 +9,11 @@ pub mod pve_rs_resource_scheduling_static {
     use std::sync::Mutex;
 
     use anyhow::Error;
+    use serde::{Deserialize, Serialize};
 
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
-    use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+    use proxmox_resource_scheduling::resource::ResourceStats;
     use proxmox_resource_scheduling::usage::Usage;
 
     use crate::bindings::resource_scheduling::{
@@ -26,7 +27,28 @@ pub mod pve_rs_resource_scheduling_static {
         inner: Mutex<Usage>,
     }
 
-    type StaticResource = PveResource<StaticServiceUsage>;
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Static usage stats of a resource.
+    pub struct StaticResourceStats {
+        /// Number of assigned CPUs or CPU limit.
+        pub maxcpu: f64,
+        /// Maximum assigned memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<StaticResourceStats> for ResourceStats {
+        fn from(stats: StaticResourceStats) -> Self {
+            Self {
+                cpu: stats.maxcpu,
+                maxcpu: stats.maxcpu,
+                mem: stats.maxmem,
+                maxmem: stats.maxmem,
+            }
+        }
+    }
+
+    type StaticResource = PveResource<StaticResourceStats>;
 
     /// Class method: Create a new [`Scheduler`] instance.
     ///
@@ -113,13 +135,13 @@ pub mod pve_rs_resource_scheduling_static {
         #[try_from_ref] this: &Scheduler,
         nodename: &str,
         sid: &str,
-        service_usage: StaticServiceUsage,
+        service_stats: StaticResourceStats,
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
         // TODO Only for backwards compatibility, can be removed with a proper version bump
         #[allow(deprecated)]
-        usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
+        usage.add_resource_usage_to_node(nodename, sid, service_stats.into())
     }
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
@@ -138,7 +160,7 @@ pub mod pve_rs_resource_scheduling_static {
     #[export]
     pub fn score_nodes_to_start_service(
         #[try_from_ref] this: &Scheduler,
-        service_stats: StaticServiceUsage,
+        service_stats: StaticResourceStats,
     ) -> Result<Vec<(String, f64)>, Error> {
         let usage = this.inner.lock().unwrap();
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs
  2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
@ 2026-03-27 14:18   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:18 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The StaticServiceUsage is marked as deprecated in
> proxmox-resource-scheduling now to make the crate independent of the
> specific usage structs and the deserialization of these.
>
> Therefore, define the same struct in the pve_static bindings module.
>
> Though this is technically a Rust API break, the Perl bindings do not
> have the concept of structs, which are serialized as Perl hashes.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (13 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
  2026-03-27 14:15   ` Dominik Rusovac
  2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
                   ` (24 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
  To: pve-devel

The implementation is similar to pve_static, but extends the node and
resource stats with sampled runtime usage statistics, i.e., the actual
usage on the nodes and the actual usages of the resources.

In the case of users repeatedly calling score_nodes_to_start_resource()
and then adding them as starting resources with add_resource(), these
starting resources need to be accumulated on top of these nodes actual
current usages to prevent score_nodes_to_start_resource() to favor the
currently least loaded node(s) for all starting resources.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- move this patch one before 'expose auto rebalancing methods' as this
  is the same change order as done in pve-ha-manager, making it easier
  to separate the feature of using dynamic usage information and
  afterwards allowing rebalancing methods with static and dynamic usage
  information
- adapt patch message accordingly
- s/service/resource/ for any new struct and method as this is more
  consistent with the naming in the HA Manager and the name of the
  crate/module itself; can change this back if it's better in the other
  way, but as these are new API endpoints, I thought it's better to do
  it now than later

 pve-rs/Makefile                               |   1 +
 .../src/bindings/resource_scheduling/mod.rs   |   3 +
 .../resource_scheduling/pve_dynamic.rs        | 174 ++++++++++++++++++
 .../src/bindings/resource_scheduling/usage.rs |  33 ++++
 pve-rs/test/resource_scheduling.pl            |   1 +
 5 files changed, 212 insertions(+)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs

diff --git a/pve-rs/Makefile b/pve-rs/Makefile
index 9faa735..f0212b7 100644
--- a/pve-rs/Makefile
+++ b/pve-rs/Makefile
@@ -30,6 +30,7 @@ PERLMOD_PACKAGES := \
 	  PVE::RS::OCI \
 	  PVE::RS::OpenId \
 	  PVE::RS::ResourceScheduling::Static \
+	  PVE::RS::ResourceScheduling::Dynamic \
 	  PVE::RS::SDN::Fabrics \
 	  PVE::RS::TFA
 
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index 9ce631c..87b4a03 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -5,3 +5,6 @@ mod usage;
 
 mod pve_static;
 pub use pve_static::pve_rs_resource_scheduling_static;
+
+mod pve_dynamic;
+pub use pve_dynamic::pve_rs_resource_scheduling_dynamic;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
new file mode 100644
index 0000000..5b4373e
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -0,0 +1,174 @@
+#[perlmod::package(name = "PVE::RS::ResourceScheduling::Dynamic", lib = "pve_rs")]
+pub mod pve_rs_resource_scheduling_dynamic {
+    //! The `PVE::RS::ResourceScheduling::Dynamic` package.
+    //!
+    //! Provides bindings for the dynamic resource scheduling module.
+    //!
+    //! See [`proxmox_resource_scheduling`].
+
+    use std::sync::Mutex;
+
+    use anyhow::Error;
+    use serde::{Deserialize, Serialize};
+
+    use perlmod::Value;
+    use proxmox_resource_scheduling::node::NodeStats;
+    use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::usage::Usage;
+
+    use crate::bindings::resource_scheduling::resource::PveResource;
+    use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+
+    perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
+
+    /// A scheduler instance contains the cluster usage.
+    pub struct Scheduler {
+        inner: Mutex<Usage>,
+    }
+
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Dynamic usage stats of a node.
+    pub struct DynamicNodeStats {
+        /// CPU utilization in CPU cores.
+        pub cpu: f64,
+        /// Total number of CPU cores.
+        pub maxcpu: usize,
+        /// Used memory in bytes.
+        pub mem: usize,
+        /// Total memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<DynamicNodeStats> for NodeStats {
+        fn from(value: DynamicNodeStats) -> Self {
+            Self {
+                cpu: value.cpu,
+                maxcpu: value.maxcpu,
+                mem: value.mem,
+                maxmem: value.maxmem,
+            }
+        }
+    }
+
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Dynamic usage stats of a resource.
+    pub struct DynamicResourceStats {
+        /// CPU utilization in CPU cores.
+        pub cpu: f64,
+        /// Number of assigned CPUs or CPU limit.
+        pub maxcpu: f64,
+        /// Used memory in bytes.
+        pub mem: usize,
+        /// Maximum assigned memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<DynamicResourceStats> for ResourceStats {
+        fn from(value: DynamicResourceStats) -> Self {
+            Self {
+                cpu: value.cpu,
+                maxcpu: value.maxcpu,
+                mem: value.mem,
+                maxmem: value.maxmem,
+            }
+        }
+    }
+
+    type DynamicResource = PveResource<DynamicResourceStats>;
+
+    /// Class method: Create a new [`Scheduler`] instance.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::new`].
+    #[export(raw_return)]
+    pub fn new(#[raw] class: Value) -> Result<Value, Error> {
+        let inner = Usage::new();
+
+        Ok(perlmod::instantiate_magic!(
+            &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
+        ))
+    }
+
+    /// Method: Add a node with its basic CPU and memory info.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
+    #[export]
+    pub fn add_node(
+        #[try_from_ref] this: &Scheduler,
+        nodename: String,
+        stats: DynamicNodeStats,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_node(nodename, stats.into())
+    }
+
+    /// Method: Remove a node from the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
+    #[export]
+    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.remove_node(nodename);
+    }
+
+    /// Method: Get a list of all the nodes in the scheduler.
+    #[export]
+    pub fn list_nodes(#[try_from_ref] this: &Scheduler) -> Vec<String> {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .nodenames_iter()
+            .map(|nodename| nodename.to_string())
+            .collect()
+    }
+
+    /// Method: Check whether a node exists in the scheduler.
+    #[export]
+    pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
+        let usage = this.inner.lock().unwrap();
+
+        usage.contains_node(nodename)
+    }
+
+    /// Method: Add `resource` with identifier `sid` to the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+    #[export]
+    pub fn add_resource(
+        #[try_from_ref] this: &Scheduler,
+        sid: String,
+        resource: DynamicResource,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_resource(sid, resource.try_into()?)
+    }
+
+    /// Method: Remove resource `sid` and its usage from all assigned nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
+    #[export]
+    fn remove_resource(#[try_from_ref] this: &Scheduler, sid: &str) {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.remove_resource(sid);
+    }
+
+    /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
+    #[export]
+    pub fn score_nodes_to_start_resource(
+        #[try_from_ref] this: &Scheduler,
+        resource_stats: DynamicResourceStats,
+    ) -> Result<Vec<(String, f64)>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .to_scheduler::<StartingAsStartedResourceAggregator>()
+            .score_nodes_to_start_resource(resource_stats)
+    }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index fc8b872..87b7e3e 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -1,4 +1,5 @@
 use proxmox_resource_scheduling::{
+    resource::ResourceState,
     scheduler::NodeUsage,
     usage::{Usage, UsageAggregator},
 };
@@ -31,3 +32,35 @@ impl UsageAggregator for StartedResourceAggregator {
             .collect()
     }
 }
+
+/// An aggregator, which uses the node base stats and adds any starting resources as already
+/// started resources to the node stats.
+///
+/// This aggregator is useful if starting resources should be considered in the scheduler.
+pub(crate) struct StartingAsStartedResourceAggregator;
+
+impl UsageAggregator for StartingAsStartedResourceAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| {
+                let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
+                    let mut node_stats = node_stats;
+
+                    if let Some(resource) = usage.get_resource(sid)
+                        && resource.state() == ResourceState::Starting
+                    {
+                        node_stats.add_started_resource(&resource.stats());
+                    }
+
+                    node_stats
+                });
+
+                NodeUsage {
+                    name: nodename.to_string(),
+                    stats,
+                }
+            })
+            .collect()
+    }
+}
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
index a332269..3775242 100755
--- a/pve-rs/test/resource_scheduling.pl
+++ b/pve-rs/test/resource_scheduling.pl
@@ -6,6 +6,7 @@ use warnings;
 use Test::More;
 
 use PVE::RS::ResourceScheduling::Static;
+use PVE::RS::ResourceScheduling::Dynamic;
 
 my sub score_nodes {
     my ($static, $service) = @_;
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings
  2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
@ 2026-03-27 14:15   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:15 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider modulo one nit

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The implementation is similar to pve_static, but extends the node and
> resource stats with sampled runtime usage statistics, i.e., the actual
> usage on the nodes and the actual usages of the resources.
>
> In the case of users repeatedly calling score_nodes_to_start_resource()
> and then adding them as starting resources with add_resource(), these
> starting resources need to be accumulated on top of these nodes actual
> current usages to prevent score_nodes_to_start_resource() to favor the
> currently least loaded node(s) for all starting resources.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - move this patch one before 'expose auto rebalancing methods' as this
>   is the same change order as done in pve-ha-manager, making it easier
>   to separate the feature of using dynamic usage information and
>   afterwards allowing rebalancing methods with static and dynamic usage
>   information
> - adapt patch message accordingly
> - s/service/resource/ for any new struct and method as this is more
>   consistent with the naming in the HA Manager and the name of the
>   crate/module itself; can change this back if it's better in the other
>   way, but as these are new API endpoints, I thought it's better to do
>   it now than later
>

[snip]

> +impl UsageAggregator for StartingAsStartedResourceAggregator {
> +    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
> +        usage
> +            .nodes_iter()
> +            .map(|(nodename, node)| {

nice fold!

nit: by making `node_stats` mutable in the first place, variable
     shadowing can be avoided, see:

    let stats = node.resources_iter().fold(node.stats(), |mut node_stats, sid| {
    
        if let Some(resource) = usage.get_resource(sid) {
            node_stats.add_started_resource(&resource.stats());
        }
    
        node_stats
    });

> +                let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
> +                    let mut node_stats = node_stats;
> +
> +                    if let Some(resource) = usage.get_resource(sid)
> +                        && resource.state() == ResourceState::Starting
> +                    {
> +                        node_stats.add_started_resource(&resource.stats());
> +                    }
> +
> +                    node_stats
> +                });
> +
> +                NodeUsage {
> +                    name: nodename.to_string(),
> +                    stats,
> +                }
> +            })
> +            .collect()
> +    }
> +}

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (14 preceding siblings ...)
  2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-27 14:16   ` Dominik Rusovac
  2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
                   ` (23 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These methods expose the auto rebalancing methods of both the static and
dynamic scheduler.

As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
takes a possible very large list of migration candidates, the binding
takes a more compact representation, which reduces the size that needs
to be generated on the caller's side and therefore the runtime of the
serialization from Perl to Rust.

Additionally, while decomposing the compact representation the input
data is validated since the underlying scoring methods do not further
validate whether their input is consistent with the cluster usage.

The method names score_best_balancing_migration_candidates{,_topsis}()
are chosen deliberately, so that future extensions can implement
score_best_balancing_migrations{,_topsis}(), which might allow to score
migrations without providing the candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- improve patch message and documentation
- move to the end of the perl-rs changes, which makes it more consistent
  with the change order in pve-ha-manager as well
- uses `UsageAggregator` now to discern how usages are accumulated
- s/generate_migration_candidates_from
   /decompose_compact_migration_candidates
- make the decomposition of compact migration candidates more robust and
  do not use any unwraps or other causes of panic but the Mutex guard
  unwrap

 .../resource_scheduling/pve_dynamic.rs        | 57 +++++++++++-
 .../resource_scheduling/pve_static.rs         | 56 +++++++++++-
 .../bindings/resource_scheduling/resource.rs  | 88 ++++++++++++++++++-
 .../src/bindings/resource_scheduling/usage.rs | 15 ++++
 4 files changed, 211 insertions(+), 5 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
index 5b4373e..26f36d1 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -14,10 +14,15 @@ pub mod pve_rs_resource_scheduling_dynamic {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
-    use crate::bindings::resource_scheduling::resource::PveResource;
-    use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+    use crate::bindings::resource_scheduling::resource::{
+        CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+    };
+    use crate::bindings::resource_scheduling::usage::{
+        IdentityAggregator, StartingAsStartedResourceAggregator,
+    };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
 
@@ -157,6 +162,54 @@ pub mod pve_rs_resource_scheduling_dynamic {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage.to_scheduler::<IdentityAggregator>().node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index e2756db..7924889 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -14,10 +14,14 @@ pub mod pve_rs_resource_scheduling_static {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
     use crate::bindings::resource_scheduling::{
-        resource::PveResource, usage::StartedResourceAggregator,
+        resource::{
+            CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+        },
+        usage::StartedResourceAggregator,
     };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
@@ -154,6 +158,56 @@ pub mod pve_rs_resource_scheduling_static {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
index 91d56b9..9186d5b 100644
--- a/pve-rs/src/bindings/resource_scheduling/resource.rs
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -1,6 +1,8 @@
 use anyhow::{Error, bail};
-use proxmox_resource_scheduling::resource::{
-    Resource, ResourcePlacement, ResourceState, ResourceStats,
+use proxmox_resource_scheduling::{
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    scheduler::{Migration, MigrationCandidate},
+    usage::Usage,
 };
 
 use serde::{Deserialize, Serialize};
@@ -42,3 +44,85 @@ impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
         Ok(Resource::new(resource.stats.into(), state, placement))
     }
 }
+
+/// A compact representation of [`proxmox_resource_scheduling::scheduler::MigrationCandidate`].
+#[derive(Serialize, Deserialize)]
+pub struct CompactMigrationCandidate {
+    /// The identifier of the leading resource.
+    pub leader: String,
+    /// The resources which are part of the leading resource's bundle.
+    pub resources: Vec<String>,
+    /// The nodes, which are possible to migrate to for the resources.
+    pub nodes: Vec<String>,
+}
+
+/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
+/// usage from `usage`.
+///
+/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
+///
+/// - the `leader` is not present in the cluster usage
+/// - the `leader` is non-stationary
+/// - any resource in `resources` is not present in the cluster usage
+/// - any resource in `resources` is non-stationary
+/// - any resource in `resources` is on another node than the `leader`
+pub(crate) fn decompose_compact_migration_candidates(
+    usage: &Usage,
+    compact_candidates: Vec<CompactMigrationCandidate>,
+) -> Result<Vec<MigrationCandidate>, Error> {
+    // The length of `compact_candidates` is at least a lower bound
+    let mut candidates = Vec::with_capacity(compact_candidates.len());
+
+    for candidate in compact_candidates.into_iter() {
+        let leader_sid = candidate.leader;
+        let leader = match usage.get_resource(&leader_sid) {
+            Some(resource) => resource,
+            _ => bail!("leader '{leader_sid}' is not present in the cluster usage"),
+        };
+        let leader_node = match leader.placement() {
+            ResourcePlacement::Stationary { current_node } => current_node,
+            _ => bail!("leader '{leader_sid}' is non-stationary"),
+        };
+
+        if !candidate.resources.contains(&leader_sid) {
+            bail!("leader '{leader_sid}' is not present in the resources list");
+        }
+
+        let mut resource_stats = Vec::with_capacity(candidate.resources.len());
+
+        for sid in candidate.resources.iter() {
+            let resource = match usage.get_resource(sid) {
+                Some(resource) => resource,
+                _ => bail!("resource '{sid}' is not present in the cluster usage"),
+            };
+
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    if current_node != leader_node {
+                        bail!("resource '{sid}' is on other node than leader");
+                    }
+
+                    resource_stats.push(resource.stats());
+                }
+                _ => bail!("resource '{sid}' is non-stationary"),
+            }
+        }
+
+        let bundle_stats = resource_stats.into_iter().sum();
+
+        for target_node in candidate.nodes.into_iter() {
+            let migration = Migration {
+                sid: leader_sid.to_string(),
+                source_node: leader_node.to_string(),
+                target_node,
+            };
+
+            candidates.push(MigrationCandidate {
+                migration,
+                stats: bundle_stats,
+            });
+        }
+    }
+
+    Ok(candidates)
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index 87b7e3e..48f6e84 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -4,6 +4,21 @@ use proxmox_resource_scheduling::{
     usage::{Usage, UsageAggregator},
 };
 
+/// The identity aggregator, which passes the node stats as-is.
+pub(crate) struct IdentityAggregator;
+
+impl UsageAggregator for IdentityAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| NodeUsage {
+                name: nodename.to_string(),
+                stats: node.stats(),
+            })
+            .collect()
+    }
+}
+
 /// An aggregator, which adds any resource as a started resource.
 ///
 /// This aggregator is useful if the node base stats do not have any current usage.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
  2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
@ 2026-03-27 14:16   ` Dominik Rusovac
  0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:16 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Tue Mar 24, 2026 at 7:30 PM CET, Daniel Kral wrote:
> These methods expose the auto rebalancing methods of both the static and
> dynamic scheduler.
>
> As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
> takes a possible very large list of migration candidates, the binding
> takes a more compact representation, which reduces the size that needs
> to be generated on the caller's side and therefore the runtime of the
> serialization from Perl to Rust.
>
> Additionally, while decomposing the compact representation the input
> data is validated since the underlying scoring methods do not further
> validate whether their input is consistent with the cluster usage.
>
> The method names score_best_balancing_migration_candidates{,_topsis}()
> are chosen deliberately, so that future extensions can implement
> score_best_balancing_migrations{,_topsis}(), which might allow to score
> migrations without providing the candidates.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - improve patch message and documentation
> - move to the end of the perl-rs changes, which makes it more consistent
>   with the change order in pve-ha-manager as well
> - uses `UsageAggregator` now to discern how usages are accumulated
> - s/generate_migration_candidates_from
>    /decompose_compact_migration_candidates
> - make the decomposition of compact migration candidates more robust and
>   do not use any unwraps or other causes of panic but the Mutex guard
>   unwrap
>

[snip]

> +/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
> +/// usage from `usage`.
> +///
> +/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
> +///
> +/// - the `leader` is not present in the cluster usage
> +/// - the `leader` is non-stationary
> +/// - any resource in `resources` is not present in the cluster usage
> +/// - any resource in `resources` is non-stationary
> +/// - any resource in `resources` is on another node than the `leader`

nice idea, to use Vec::with_capacity in here

> +pub(crate) fn decompose_compact_migration_candidates(
> +    usage: &Usage,
> +    compact_candidates: Vec<CompactMigrationCandidate>,
> +) -> Result<Vec<MigrationCandidate>, Error> {
> +    // The length of `compact_candidates` is at least a lower bound
> +    let mut candidates = Vec::with_capacity(compact_candidates.len());

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (15 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

This makes it a little easier to read and allows appending descriptions
for other values with a cleaner diff.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/PVE/DataCenterConfig.pm | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index d88b167..e7bc8f1 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -17,9 +17,12 @@ my $crs_format = {
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
-        verbose_description => "Configures how the HA manager should select nodes to start or "
-            . "recover services. With 'basic', only the number of services is used, with 'static', "
-            . "static CPU and memory configuration of services is considered.",
+        verbose_description => <<EODESC,
+Configures how the HA Manager should select nodes to start or recover services:
+
+- with 'basic', only the number of services is used,
+- with 'static', static CPU and memory configuration of services is considered.
+EODESC
     },
     'ha-rebalance-on-start' => {
         type => 'boolean',
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (16 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- slightly changed wording (as suggested by @Maximiliano)

 src/PVE/DataCenterConfig.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index e7bc8f1..396c962 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -13,7 +13,7 @@ my $PROXMOX_OUI = 'BC:24:11';
 my $crs_format = {
     ha => {
         type => 'string',
-        enum => ['basic', 'static'],
+        enum => ['basic', 'static', 'dynamic'],
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
@@ -21,7 +21,8 @@ my $crs_format = {
 Configures how the HA Manager should select nodes to start or recover services:
 
 - with 'basic', only the number of services is used,
-- with 'static', static CPU and memory configuration of services is considered.
+- with 'static', static CPU and memory configuration of services is considered,
+- with 'dynamic', static and dynamic CPU and memory usage of services is considered.
 EODESC
     },
     'ha-rebalance-on-start' => {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (17 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-26 16:08   ` Jillian Morgan
  2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
                   ` (20 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- slightly changed wording here (as suggested by @Maximiliano)

 src/PVE/DataCenterConfig.pm | 39 +++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 396c962..52682aa 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -33,6 +33,45 @@ EODESC
             "Set to use CRS for selecting a suited node when a HA services request-state"
             . " changes from stop to start.",
     },
+    'ha-auto-rebalance' => {
+        type => 'boolean',
+        optional => 1,
+        default => 0,
+        description => "Whether to use CRS for balancing HA resources automatically"
+            . " depending on the current node imbalance.",
+    },
+    'ha-auto-rebalance-threshold' => {
+        type => 'number',
+        optional => 1,
+        default => 0.7,
+        requires => 'ha-auto-rebalance',
+        description => "The threshold for the node load, which will trigger the automatic"
+            . " resource balancing system if its value is exceeded.",
+    },
+    'ha-auto-rebalance-method' => {
+        type => 'string',
+        enum => ['bruteforce', 'topsis'],
+        optional => 1,
+        default => 'bruteforce',
+        requires => 'ha-auto-rebalance',
+        description => "The method to use for the scoring of rebalancing migrations.",
+    },
+    'ha-auto-rebalance-hold-duration' => {
+        type => 'number',
+        optional => 1,
+        default => 3,
+        requires => 'ha-auto-rebalance',
+        description => "The duration the threshold must be exceeded for to trigger an automatic"
+            . " resource balancing migration in HA rounds.",
+    },
+    'ha-auto-rebalance-margin' => {
+        type => 'number',
+        optional => 1,
+        default => 0.1,
+        requires => 'ha-auto-rebalance',
+        description => "The minimum relative improvement in cluster node imbalance to commit to"
+            . " a resource rebalancing migration.",
+    },
 };
 
 my $migration_format = {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
  2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-26 16:08   ` Jillian Morgan
  2026-03-26 16:20     ` Daniel Kral
  0 siblings, 1 reply; 64+ messages in thread
From: Jillian Morgan @ 2026-03-26 16:08 UTC (permalink / raw)
  To: Daniel Kral; +Cc: pve-devel

On Tue, Mar 24, 2026 at 2:34 PM Daniel Kral <d.kral@proxmox.com> wrote:

> +    'ha-auto-rebalance-hold-duration' => {
> +        type => 'number',
> +        optional => 1,
> +        default => 3,
> +        requires => 'ha-auto-rebalance',
> +        description => "The duration the threshold must be exceeded for
> to trigger an automatic"
> +            . " resource balancing migration in HA rounds.",
> +    },
>
>
What are the units of these duration numbers? Miliseconds or days? ;-)
Perhaps it is the "HA rounds" part that is key here but the statement is
unclear to me. Is that a duration, or a discrete number of events? How long
is each "HA round"?

Perhaps a clarification like this: "The number of HA Rounds for which
the ha-auto-rebalance-threshold must be exceeded before triggering an
automatic resource balancing migration."
And perhaps an additional hint could be provided that an HA Round is "10
seconds" (I think?)

-- Jillian

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
  2026-03-26 16:08   ` Jillian Morgan
@ 2026-03-26 16:20     ` Daniel Kral
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 16:20 UTC (permalink / raw)
  To: Jillian Morgan; +Cc: pve-devel

On Thu Mar 26, 2026 at 5:08 PM CET, Jillian Morgan wrote:
> On Tue, Mar 24, 2026 at 2:34 PM Daniel Kral <d.kral@proxmox.com> wrote:
>
>> +    'ha-auto-rebalance-hold-duration' => {
>> +        type => 'number',
>> +        optional => 1,
>> +        default => 3,
>> +        requires => 'ha-auto-rebalance',
>> +        description => "The duration the threshold must be exceeded for
>> to trigger an automatic"
>> +            . " resource balancing migration in HA rounds.",
>> +    },
>>
>>
> What are the units of these duration numbers? Miliseconds or days? ;-)
> Perhaps it is the "HA rounds" part that is key here but the statement is
> unclear to me. Is that a duration, or a discrete number of events? How long
> is each "HA round"?
>
> Perhaps a clarification like this: "The number of HA Rounds for which
> the ha-auto-rebalance-threshold must be exceeded before triggering an
> automatic resource balancing migration."
> And perhaps an additional hint could be provided that an HA Round is "10
> seconds" (I think?)

Hi Jillian!


Thanks for taking a look!

You're right, there should be more emphasis on the 'HA rounds' part!

I thought about using seconds in v1, but I went with the HA rounds as in
'number of repeated tries' as that measure is a better guarantee.
Putting "The number of HA rounds" at the start makes the 'unit' for this
property also clearer, will change it to that and add a hint about the
length of the HA rounds.


Best regards

    Daniel




^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (18 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-25 21:43   ` Thomas Lamprecht
  2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
                   ` (19 subsequent siblings)
  39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

Fetch the dynamic node and service stats with rrd_dump(), which is
periodically sampled and broadcasted by the PVE nodes' pvestatd service
and propagated through the pmxcfs.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- use constants for the RRD entry indices
- add note about the capping of the maxcpu property for guests

 src/PVE/HA/Env/PVE2.pm | 63 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 04cd1bfe..4dfb304e 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -42,6 +42,19 @@ my $lockdir = "/etc/pve/priv/lock";
 # taken from PVE::Service::pvestatd::update_{lxc,qemu}_status()
 use constant {
     RRD_VM_INDEX_STATUS => 2,
+    RRD_VM_INDEX_MAXCPU => 5,
+    RRD_VM_INDEX_CPU => 6,
+    RRD_VM_INDEX_MAXMEM => 7,
+    RRD_VM_INDEX_MEM => 8,
+};
+
+# rrd entry indices for PVE nodes
+# taken from PVE::Service::pvestatd::update_node_status()
+use constant {
+    RRD_NODE_INDEX_MAXCPU => 4,
+    RRD_NODE_INDEX_CPU => 5,
+    RRD_NODE_INDEX_MAXMEM => 7,
+    RRD_NODE_INDEX_MEM => 8,
 };
 
 sub new {
@@ -569,6 +582,30 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self, $id) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = get_cluster_service_stats();
+    for my $sid (keys %$stats) {
+        my $id = $stats->{$sid}->{id};
+        my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
+
+        # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
+        my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
+
+        $stats->{$sid}->{usage} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -588,6 +625,32 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = {};
+    for my $key (keys %$rrd) {
+        my ($nodename) = $key =~ m/^pve-node-9.0\/(\w+)$/;
+
+        next if !$nodename;
+
+        my $rrdentry = $rrd->{$key} // [];
+
+        my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
+
+        $stats->{$nodename} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats
  2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-25 21:43   ` Thomas Lamprecht
  0 siblings, 0 replies; 64+ messages in thread
From: Thomas Lamprecht @ 2026-03-25 21:43 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

Am 24.03.26 um 19:31 schrieb Daniel Kral:
> Fetch the dynamic node and service stats with rrd_dump(), which is
> periodically sampled and broadcasted by the PVE nodes' pvestatd service
> and propagated through the pmxcfs.

one small code issue inline that can be fixed up too on applying if nothing
else comes up.

> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - use constants for the RRD entry indices
> - add note about the capping of the maxcpu property for guests
> 
>  src/PVE/HA/Env/PVE2.pm | 63 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
> index 04cd1bfe..4dfb304e 100644
> --- a/src/PVE/HA/Env/PVE2.pm
> +++ b/src/PVE/HA/Env/PVE2.pm

> @@ -569,6 +582,30 @@ sub get_static_service_stats {
>      return $stats;
>  }
>  
> +sub get_dynamic_service_stats {
> +    my ($self, $id) = @_;

this $id param is not directly used and no calling site passes any such
param, and it's also shadowed by the one inside the loop below.

> +
> +    my $rrd = PVE::Cluster::rrd_dump();
> +
> +    my $stats = get_cluster_service_stats();
> +    for my $sid (keys %$stats) {
> +        my $id = $stats->{$sid}->{id};
> +        my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
> +
> +        # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
> +        my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
> +
> +        $stats->{$sid}->{usage} = {
> +            maxcpu => $maxcpu,
> +            cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
> +            maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
> +            mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
> +        };
> +    }
> +
> +    return $stats;
> +}
> +
>  sub get_static_node_stats {
>      my ($self) = @_;
>  





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (19 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

CRM expects f64 for cpu-related values and usize for mem-related values.
Hence, pass doubles for the former and ints for the latter.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes

 src/PVE/HA/Sim/Hardware.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 59afb44a..9f29fa6c 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -488,9 +488,9 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
+            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +507,7 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4, maxmem => 4096 } } keys %$services };
+        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
         $self->write_static_service_stats($stats);
     }
 
@@ -874,7 +874,7 @@ sub sim_hardware_cmd {
 
                 $self->set_static_service_stats(
                     $sid,
-                    { maxcpu => $params[0], maxmem => $params[1] },
+                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
                 );
 
             } elsif ($action eq 'delete') {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (20 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes

 src/PVE/HA/Sim/Hardware.pm | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 9f29fa6c..47839112 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,6 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_maxcpu = 4.0;
+my $default_service_maxmem = 4096 * 1024**2;
+my $default_node_maxcpu = 24.0;
+my $default_node_maxmem = 131072 * 1024**2;
+
 # Status directory layout
 #
 # configuration
@@ -488,9 +493,24 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node1 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node2 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node3 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +527,12 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
+        my $stats = {
+            map {
+                $_ => { maxcpu => $default_service_maxcpu, maxmem => $default_service_maxmem }
+            }
+                keys %$services
+        };
         $self->write_static_service_stats($stats);
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (21 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

While falsy, values of 0 or 0.0 are valid stats. Hence, use
'defined'-check to avoid skipping falsy static service stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

wrt v1:
- do not skip falsy stats

 src/PVE/HA/Sim/Hardware.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 47839112..c167abd7 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -202,11 +202,11 @@ sub set_static_service_stats {
 
     my $stats = $self->read_static_service_stats();
 
-    if (my $memory = $new_stats->{maxmem}) {
+    if (defined(my $memory = $new_stats->{maxmem})) {
         $stats->{$sid}->{maxmem} = $memory;
     }
 
-    if (my $cpu = $new_stats->{maxcpu}) {
+    if (defined(my $cpu = $new_stats->{maxcpu})) {
         $stats->{$sid}->{maxcpu} = $cpu;
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (22 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

This adds functionality to simulate dynamic stats of a service, that is,
cpu load (cores) and memory usage (MiB).

Analogous to static service stats, within tests, dynamic service stats
can be specified in file dynamic_service_stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

wrt v1:
- do not skip falsy stats in set_dynamic_service_stats

 src/PVE/HA/Sim/Hardware.pm | 52 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index c167abd7..cb4a1504 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,8 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_cpu = 2.0;
 my $default_service_maxcpu = 4.0;
+my $default_service_mem = 2048 * 1024**2;
 my $default_service_maxmem = 4096 * 1024**2;
+
 my $default_node_maxcpu = 24.0;
 my $default_node_maxmem = 131072 * 1024**2;
 
@@ -213,6 +216,25 @@ sub set_static_service_stats {
     $self->write_static_service_stats($stats);
 }
 
+sub set_dynamic_service_stats {
+    my ($self, $sid, $new_stats) = @_;
+
+    my $conf = $self->read_service_config();
+    die "no such service '$sid'" if !$conf->{$sid};
+
+    my $stats = $self->read_dynamic_service_stats();
+
+    if (defined(my $memory = $new_stats->{mem})) {
+        $stats->{$sid}->{mem} = $memory;
+    }
+
+    if (defined(my $cpu = $new_stats->{cpu})) {
+        $stats->{$sid}->{cpu} = $cpu;
+    }
+
+    $self->write_dynamic_service_stats($stats);
+}
+
 sub add_service {
     my ($self, $sid, $opts, $running) = @_;
 
@@ -438,6 +460,16 @@ sub read_static_service_stats {
     return $stats;
 }
 
+sub read_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
+    $self->log('error', "loading dynamic service stats failed - $@") if $@;
+
+    return $stats;
+}
+
 sub write_static_service_stats {
     my ($self, $stats) = @_;
 
@@ -446,6 +478,14 @@ sub write_static_service_stats {
     $self->log('error', "writing static service stats failed - $@") if $@;
 }
 
+sub write_dynamic_service_stats {
+    my ($self, $stats) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    eval { PVE::HA::Tools::write_json_to_file($filename, $stats) };
+    $self->log('error', "writing dynamic service stats failed - $@") if $@;
+}
+
 sub new {
     my ($this, $testdir) = @_;
 
@@ -536,6 +576,18 @@ sub new {
         $self->write_static_service_stats($stats);
     }
 
+    if (-f "$testdir/dynamic_service_stats") {
+        copy("$testdir/dynamic_service_stats", "$statusdir/dynamic_service_stats");
+    } else {
+        my $services = $self->read_static_service_stats();
+        my $stats = {
+            map { $_ => { cpu => $default_service_cpu, mem => $default_service_mem } }
+                keys %$services
+        };
+
+        $self->write_dynamic_service_stats($stats);
+    }
+
     my $cstatus = $self->read_hardware_status_nolock();
 
     foreach my $node (sort keys %$cstatus) {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (23 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Add command to set dynamic service stats and handle respective commands
set-dynamic-stats and set-static-stats analogously.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

wrt v1:
- merge the two branches for set-static-stats and set-dynamic-stats
  commands to avoid code duplication

 src/PVE/HA/Sim/Hardware.pm   | 34 ++++++++++++++++++++++++++--------
 src/PVE/HA/Sim/RTHardware.pm |  4 +++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index cb4a1504..89180ad7 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -795,7 +795,8 @@ sub get_cfs_state {
 #   service <sid> stop <timeout>
 #   service <sid> lock/unlock [lockname]
 #   service <sid> add <node> [<request-state=started>] [<running=0>]
-#   service <sid> set-static-stats <maxcpu> <maxmem>
+#   service <sid> set-static-stats  [maxcpu <cores>] [maxmem <MiB>]
+#   service <sid> set-dynamic-stats [cpu <cores>] [mem <MiB>]
 #   service <sid> delete
 sub sim_hardware_cmd {
     my ($self, $cmdstr, $logid) = @_;
@@ -945,15 +946,32 @@ sub sim_hardware_cmd {
                     $params[2] || 0,
                 );
 
-            } elsif ($action eq 'set-static-stats') {
-                die "sim_hardware_cmd: missing maxcpu for '$action' command" if !$params[0];
-                die "sim_hardware_cmd: missing maxmem for '$action' command" if !$params[1];
+            } elsif ($action eq 'set-static-stats' || $action eq 'set-dynamic-stats') {
+                die "sim_hardware_cmd: missing target stat for '$action' command"
+                    if !@params;
 
-                $self->set_static_service_stats(
-                    $sid,
-                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
-                );
+                my $conversions =
+                    $action eq 'set-static-stats'
+                    ? { maxcpu => sub { 0.0 + $_[0] }, maxmem => sub { $_[0] * 1024**2 } }
+                    : { cpu => sub { 0.0 + $_[0] }, mem => sub { $_[0] * 1024**2 } };
 
+                my %new_stats;
+                for my ($target, $val) (@params) {
+                    die "sim_hardware_cmd: missing value for '$action $target' command"
+                        if !defined($val);
+
+                    my $convert = $conversions->{$target}
+                        or die
+                        "sim_hardware_cmd: unknown target stat '$target' for '$action' command";
+
+                    $new_stats{$target} = $convert->($val);
+                }
+
+                if ($action eq 'set-static-stats') {
+                    $self->set_static_service_stats($sid, \%new_stats);
+                } else {
+                    $self->set_dynamic_service_stats($sid, \%new_stats);
+                }
             } elsif ($action eq 'delete') {
 
                 $self->delete_service($sid);
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 9a83d098..9528f542 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -532,7 +532,9 @@ sub show_service_add_dialog {
 
         my $maxcpu = $cpu_count_spin->get_value();
         my $maxmem = $memory_spin->get_value();
-        $self->sim_hardware_cmd("service $sid set-static-stats $maxcpu $maxmem", 'command');
+        $self->sim_hardware_cmd(
+            "service $sid set-static-stats maxcpu $maxcpu maxmem $maxmem", 'command',
+        );
 
         $self->add_service_to_gui($sid);
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (24 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Aggregation of dynamic node stats is lazy.

Getters log on warning level in case of overcommitted stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

wrt v1:
- keep each commit functional on its own
- allow testing overcommitted scenarios

 src/PVE/HA/Sim/Env.pm      | 12 ++++++++
 src/PVE/HA/Sim/Hardware.pm | 59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index ad51245c..65d4efad 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -500,12 +500,24 @@ sub get_static_service_stats {
     return $self->{hardware}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{hardware}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 89180ad7..c9362fd6 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1196,6 +1196,27 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $stats = get_cluster_service_stats($self);
+    my $static_stats = $self->read_static_service_stats();
+    my $dynamic_stats = $self->read_dynamic_service_stats();
+
+    for my $sid (keys %$stats) {
+        $stats->{$sid}->{usage} = {
+            $static_stats->{$sid}->%*, $dynamic_stats->{$sid}->%*,
+        };
+
+        $self->log('warning', "overcommitted cpu on '$sid'")
+            if $stats->{$sid}->{usage}->{cpu} > $stats->{$sid}->{usage}->{maxcpu};
+        $self->log('warning', "overcommitted mem on '$sid'")
+            if $stats->{$sid}->{usage}->{mem} > $stats->{$sid}->{usage}->{maxmem};
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -1209,6 +1230,44 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $stats = $self->get_static_node_stats();
+    for my $node (keys %$stats) {
+        $stats->{$node}->{maxcpu} = $stats->{$node}->{maxcpu} // $default_node_maxcpu;
+        $stats->{$node}->{cpu} = $stats->{$node}->{cpu} // 0.0;
+        $stats->{$node}->{maxmem} = $stats->{$node}->{maxmem} // $default_node_maxmem;
+        $stats->{$node}->{mem} = $stats->{$node}->{mem} // 0;
+    }
+
+    my $service_conf = $self->read_service_config();
+    my $dynamic_service_stats = $self->get_dynamic_service_stats();
+
+    my $cstatus = $self->read_hardware_status_nolock();
+    my $node_service_status = { map { $_ => $self->read_service_status($_) } keys %$cstatus };
+
+    for my $sid (keys %$service_conf) {
+        my $node = $service_conf->{$sid}->{node};
+
+        if ($node_service_status->{$node}->{$sid}) {
+            my ($cpu, $mem) = $dynamic_service_stats->{$sid}->{usage}->@{qw(cpu mem)};
+
+            die "unknown cpu load for '$sid'" if !defined($cpu);
+            $stats->{$node}->{cpu} += $cpu;
+            $self->log('warning', "overcommitted cpu on '$node'")
+                if $stats->{$node}->{cpu} > $stats->{$node}->{maxcpu};
+
+            die "unknown memory usage for '$sid'" if !defined($mem);
+            $stats->{$node}->{mem} += $mem;
+            $self->log('warning', "overcommitted mem on '$node'")
+                if $stats->{$node}->{mem} > $stats->{$node}->{maxmem};
+        }
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (25 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

The method is already dependent on three members of the service data and
in a following patch a fourth member is needed for adding more
information to the Usage implementations.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/PVE/HA/Manager.pm | 11 +++++------
 src/PVE/HA/Usage.pm   |  6 +++---
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 421c17da..d4b75ca9 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -281,17 +281,17 @@ sub recompute_online_node_usage {
     foreach my $sid (sort keys %{ $self->{ss} }) {
         my $sd = $self->{ss}->{$sid};
 
-        $online_node_usage->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     # add remaining non-HA resources to online node usage
     for my $sid (sort keys %$service_stats) {
         next if $self->{ss}->{$sid};
 
-        my ($node, $state) = $service_stats->{$sid}->@{qw(node state)};
-
         # the migration target is not known for non-HA resources
-        $online_node_usage->add_service_usage($sid, $state, $node, undef);
+        my $sd = { $service_stats->%{qw(node state)} };
+
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     $self->{online_node_usage} = $online_node_usage;
@@ -329,8 +329,7 @@ my $change_service_state = sub {
     }
 
     $self->{online_node_usage}->remove_service_usage($sid);
-    $self->{online_node_usage}
-        ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+    $self->{online_node_usage}->add_service_usage($sid, $sd);
 
     $sd->{uid} = compute_new_uuid($new_state);
 
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 9f19a82b..6d53f956 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -40,12 +40,12 @@ sub add_service_usage_to_node {
     die "implement in subclass";
 }
 
-# Adds service $sid's usage to the online nodes according to their $state,
-# $service_node and $migration_target.
+# Adds service $sid's usage to the online nodes according to their service data $sd.
 sub add_service_usage {
-    my ($self, $sid, $service_state, $service_node, $migration_target) = @_;
+    my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
+    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
     my ($current_node, $target_node) =
         get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (26 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

Remove some unnecessary destructuring syntax for the helper.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/PVE/HA/Rules/ResourceAffinity.pm |  3 +--
 src/PVE/HA/Usage.pm                  | 11 +++++------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 1c610430..474d3000 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -511,8 +511,7 @@ sub get_resource_affinity {
     my $get_used_service_nodes = sub {
         my ($sid) = @_;
         return (undef, undef) if !defined($ss->{$sid});
-        my ($state, $node, $target) = $ss->{$sid}->@{qw(state node target)};
-        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $state, $node, $target);
+        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $ss->{$sid});
     };
 
     for my $csid (keys $positive->%*) {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 6d53f956..5f1ac226 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -45,9 +45,7 @@ sub add_service_usage {
     my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
-    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
-    my ($current_node, $target_node) =
-        get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
+    my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
     $self->add_service_usage_to_node($current_node, $sid) if $current_node;
     $self->add_service_usage_to_node($target_node, $sid) if $target_node;
@@ -67,10 +65,11 @@ sub score_nodes_to_start_service {
 }
 
 # Returns the current and target node as a two-element array, that a service
-# puts load on according to the $online_nodes and the service's $state, $node
-# and $target.
+# puts load on according to the $online_nodes and the service data $sd.
 sub get_used_service_nodes {
-    my ($online_nodes, $state, $node, $target) = @_;
+    my ($online_nodes, $sd) = @_;
+
+    my ($state, $node, $target) = $sd->@{qw(state node target)};
 
     return (undef, undef) if $state eq 'stopped' || $state eq 'request_start';
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 29/40] add running flag to cluster service stats
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (27 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

The running flag is needed to discriminate starting and started
resources from each other, which is a required parameter for using the
new add_service(...) method for the resource scheduling bindings.

See the next patch for the usage implementations, which passes the
running flag to the add_service(...) method, for more information about
the details.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/PVE/HA/Env/PVE2.pm     | 1 +
 src/PVE/HA/Manager.pm      | 2 +-
 src/PVE/HA/Sim/Hardware.pm | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 4dfb304e..a2173d95 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -549,6 +549,7 @@ my sub get_cluster_service_stats {
             id => $id,
             node => $nodename,
             state => $state,
+            running => $state eq 'started',
             type => $type,
             usage => {},
         };
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index d4b75ca9..152e18e5 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -289,7 +289,7 @@ sub recompute_online_node_usage {
         next if $self->{ss}->{$sid};
 
         # the migration target is not known for non-HA resources
-        my $sd = { $service_stats->%{qw(node state)} };
+        my $sd = { $service_stats->{$sid}->%{qw(node state running)} };
 
         $online_node_usage->add_service_usage($sid, $sd);
     }
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index c9362fd6..c7e00bed 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1165,6 +1165,7 @@ my sub get_cluster_service_stats {
         $stats->{$sid} = {
             node => $cfg->{node},
             state => $cfg->{state},
+            running => $cfg->{state} eq 'started',
             usage => {},
         };
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (28 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

The pve_static (and upcoming pve_dynamic) bindings expose the new
add_resource(...) method, which allow adding resources in a single call
with the additional running flag.

The running flag is needed to discriminate starting and started HA
resources from each other, which is needed to correctly account for HA
resources for the dynamic load usage implementation in the next patch.

This is because for the dynamic load usage, any HA resource, which is
scheduled to start by the HA Manager in the same round, will not be
accounted for in the next call to score_nodes_to_start_resource(...).
This is not a problem for the static load usage, because there the
current node usages are derived from the running resources on every
call already.

Passing only the HA resources' 'state' property is not enough since the
HA Manager will move any HA resource from the 'request_start' (or
through other transient states such as 'request_start_balance' and a
successful 'migrate'/'relocate') into the 'started' state.

This 'started' state is then picked up by the HA resource's LRM, which
will actually start the HA resource and if successful respond with a
'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
adding the running flag to the HA resource's state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/PVE/HA/Usage.pm        | 12 +++++++-----
 src/PVE/HA/Usage/Basic.pm  |  9 ++++++++-
 src/PVE/HA/Usage/Static.pm | 20 ++++++++++++++------
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 5f1ac226..822b884c 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,9 +33,8 @@ sub contains_node {
     die "implement in subclass";
 }
 
-# Logs a warning to $haenv upon failure, but does not die.
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
     die "implement in subclass";
 }
@@ -47,8 +46,11 @@ sub add_service_usage {
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
     my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
-    $self->add_service_usage_to_node($current_node, $sid) if $current_node;
-    $self->add_service_usage_to_node($target_node, $sid) if $target_node;
+    # some usage implementations need to discern whether a service is truly running
+    # a service does only have the 'running' flag in 'started' state
+    my $running = ($sd->{state} eq 'started' && $sd->{running}) || defined($current_node);
+
+    $self->add_service($sid, $current_node, $target_node, $running);
 }
 
 sub remove_service_usage {
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 2584727b..5aa3ac05 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -38,7 +38,7 @@ sub contains_node {
     return defined($self->{nodes}->{$nodename});
 }
 
-sub add_service_usage_to_node {
+my sub add_service_usage_to_node {
     my ($self, $nodename, $sid) = @_;
 
     if ($self->contains_node($nodename)) {
@@ -51,6 +51,13 @@ sub add_service_usage_to_node {
     }
 }
 
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    add_service_usage_to_node($self, $current_node, $sid) if defined($current_node);
+    add_service_usage_to_node($self, $target_node, $sid) if defined($target_node);
+}
+
 sub remove_service_usage {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 6ff20794..835f4300 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -69,17 +69,25 @@ my sub get_service_usage {
     return $service_stats;
 }
 
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
-    $self->{'node-services'}->{$nodename}->{$sid} = 1;
+    # do not add service which do not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
 
     eval {
         my $service_usage = get_service_usage($self, $sid);
-        $self->{scheduler}->add_service_usage_to_node($nodename, $sid, $service_usage);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            current_node => $current_node,
+            target_node => $target_node,
+        };
+
+        $self->{scheduler}->add_service($sid, $service);
     };
-    $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
-        if $@;
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
 }
 
 sub remove_service_usage {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (29 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

The dynamic usage scheduler allows the HA Manager to make scheduling
decisions based on the current usage of the nodes and cluster resources
in addition to the maximum usage stats as reported by the PVE::HA::Env
implementation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- guard PVE::HA::Usage::Dynamic with my $have_dynamic_scheduling as
  PVE::RS::ResourceScheduling::Dynamic might not be available (as
  suggested by @Thomas)
- add add_service() impl

 debian/pve-ha-manager.install |   1 +
 src/PVE/HA/Env.pm             |  12 ++++
 src/PVE/HA/Manager.pm         |  21 +++++++
 src/PVE/HA/Usage/Dynamic.pm   | 110 ++++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Makefile     |   2 +-
 5 files changed, 145 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 38d5d60b..75220a0b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -42,6 +42,7 @@
 /usr/share/perl5/PVE/HA/Usage.pm
 /usr/share/perl5/PVE/HA/Usage/Basic.pm
 /usr/share/perl5/PVE/HA/Usage/Static.pm
+/usr/share/perl5/PVE/HA/Usage/Dynamic.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
 /usr/share/pve-manager/templates/default/fencing-body.html.hbs
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 3643292e..44c26854 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -312,12 +312,24 @@ sub get_static_service_stats {
     return $self->{plug}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{plug}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 152e18e5..6f7b431b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -21,6 +21,12 @@ eval {
     $have_static_scheduling = 1;
 };
 
+my $have_dynamic_scheduling;
+eval {
+    require PVE::HA::Usage::Dynamic;
+    $have_dynamic_scheduling = 1;
+};
+
 ## Variable Name & Abbreviations Convention
 #
 # The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -264,6 +270,21 @@ sub recompute_online_node_usage {
                 'warning',
                 "fallback to 'basic' scheduler mode, init for 'static' failed - $@",
             ) if $@;
+        } elsif ($mode eq 'dynamic') {
+            if ($have_dynamic_scheduling) {
+                $online_node_usage = eval {
+                    $service_stats = $haenv->get_dynamic_service_stats();
+                    my $scheduler = PVE::HA::Usage::Dynamic->new($haenv, $service_stats);
+                    $scheduler->add_node($_) for $online_nodes->@*;
+                    return $scheduler;
+                };
+            } else {
+                $@ = "dynamic scheduling not available\n";
+            }
+            $haenv->log(
+                'warning',
+                "fallback to 'basic' scheduler mode, init for 'dynamic' failed - $@",
+            ) if $@;
         } elsif ($mode eq 'basic') {
             # handled below in the general fall-back case
         } else {
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
new file mode 100644
index 00000000..7e11715d
--- /dev/null
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -0,0 +1,110 @@
+package PVE::HA::Usage::Dynamic;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Dynamic;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv, $service_stats) = @_;
+
+    my $node_stats = eval { $haenv->get_dynamic_node_stats() };
+    die "did not get dynamic node usage information - $@" if $@;
+
+    my $scheduler = eval { PVE::RS::ResourceScheduling::Dynamic->new() };
+    die "unable to initialize dynamic scheduling - $@" if $@;
+
+    return bless {
+        'node-stats' => $node_stats,
+        'service-stats' => $service_stats,
+        haenv => $haenv,
+        scheduler => $scheduler,
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    my $stats = $self->{'node-stats'}->{$nodename}
+        or die "did not get dynamic node usage information for '$nodename'\n";
+    die "dynamic node usage information for '$nodename' missing cpu count\n" if !$stats->{maxcpu};
+    die "dynamic node usage information for '$nodename' missing memory\n" if !$stats->{maxmem};
+
+    eval { $self->{scheduler}->add_node($nodename, $stats); };
+    die "initializing dynamic node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+    my ($self, $sid) = @_;
+
+    my $service_stats = $self->{'service-stats'}->{$sid}->{usage}
+        or die "did not get dynamic service usage information for '$sid'\n";
+
+    return $service_stats;
+}
+
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    # do not add service which do not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
+
+    eval {
+        my $service_usage = get_service_usage($self, $sid);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            current_node => $current_node,
+            target_node => $target_node,
+        };
+
+        $self->{scheduler}->add_resource($sid, $service);
+    };
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
+}
+
+sub remove_service_usage {
+    my ($self, $sid) = @_;
+
+    eval { $self->{scheduler}->remove_resource($sid) };
+    $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid) = @_;
+
+    my $score_list = eval {
+        my $service_usage = get_service_usage($self, $sid);
+        $self->{scheduler}->score_nodes_to_start_resource($service_usage);
+    };
+    $self->{haenv}
+        ->log('err', "unable to score nodes according to dynamic usage for service '$sid' - $@")
+        if $@;
+
+    # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+    return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index befdda60..5d51a9c1 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,5 +1,5 @@
 SIM_SOURCES=Basic.pm
-SOURCES=${SIM_SOURCES} Static.pm
+SOURCES=${SIM_SOURCES} Static.pm Dynamic.pm
 
 .PHONY: install
 install:
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (30 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the scheduler using the
dynamic usage information of the HA resources with rebalance-on-start
being cleared and set respectively.

As the mechanisms for the scheduler with static and dynamic usage
information are mostly the same, these test cases verify only the
essential parts, which are:

- dynamic usage information is used correctly (for both test cases), and
- repeatedly scheduling resources with score_nodes_to_start_service(...)
  correctly simulates that the previously scheduled HA resources are
  already started

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/test/test-crs-dynamic-rebalance1/README   |  3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |  4 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  7 ++
 .../hardware_status                           |  5 ++
 .../test-crs-dynamic-rebalance1/log.expect    | 88 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  7 ++
 .../static_service_stats                      |  7 ++
 src/test/test-crs-dynamic1/README             |  4 +
 src/test/test-crs-dynamic1/cmdlist            |  4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |  6 ++
 .../test-crs-dynamic1/dynamic_service_stats   |  3 +
 src/test/test-crs-dynamic1/hardware_status    |  5 ++
 src/test/test-crs-dynamic1/log.expect         | 51 +++++++++++
 src/test/test-crs-dynamic1/manager_status     |  1 +
 src/test/test-crs-dynamic1/service_config     |  3 +
 .../test-crs-dynamic1/static_service_stats    |  3 +
 18 files changed, 209 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats

diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README
new file mode 100644
index 00000000..df0ba0a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/README
@@ -0,0 +1,3 @@
+Test rebalancing on start and how after a failed node the recovery gets
+balanced out for a small batch of HA resources with the dynamic usage
+information.
diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..0f76d24e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-rebalance-on-start": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..5ef75ae0
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "cpu": 1.3, "mem": 1073741824 },
+    "vm:102": { "cpu": 5.6, "mem": 3221225472 },
+    "vm:103": { "cpu": 0.5, "mem": 4000000000 },
+    "vm:104": { "cpu": 7.9, "mem": 2147483648 },
+    "vm:105": { "cpu": 3.2, "mem": 2684354560 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status
new file mode 100644
index 00000000..bfdbbf7b
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect
new file mode 100644
index 00000000..4017f7be
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/log.expect
@@ -0,0 +1,88 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: service vm:101: re-balance selected new node node1 for startup
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node1)
+info     20    node1/crm: service vm:102: re-balance selected new node node2 for startup
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node2)
+info     20    node1/crm: service vm:103: re-balance selected current node node3 for startup
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service vm:104: re-balance selected current node node3 for startup
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service vm:105: re-balance selected current node node3 for startup
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:101 - start relocate to node 'node1'
+info     25    node3/lrm: service vm:101 - end relocate to node 'node1'
+info     25    node3/lrm: service vm:102 - start relocate to node 'node2'
+info     25    node3/lrm: service vm:102 - end relocate to node 'node2'
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info     25    node3/lrm: starting service vm:104
+info     25    node3/lrm: service status vm:104 started
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     40    node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started'  (node = node2)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     43    node2/lrm: starting service vm:102
+info     43    node2/lrm: service status vm:102 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:104': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: service 'vm:104': state changed from 'fence' to 'recovery'
+info    240    node1/crm: service 'vm:105': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node1)
+info    240    node1/crm: recover service 'vm:104' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:104': state changed from 'recovery' to 'started'  (node = node1)
+info    240    node1/crm: recover service 'vm:105' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:105': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:103
+info    241    node1/lrm: service status vm:103 started
+info    241    node1/lrm: starting service vm:104
+info    241    node1/lrm: service status vm:104 started
+info    241    node1/lrm: starting service vm:105
+info    241    node1/lrm: service status vm:105 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config
new file mode 100644
index 00000000..3071f480
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/service_config
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats
new file mode 100644
index 00000000..a9e810d7
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:103": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:104": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:105": { "maxcpu": 8, "maxmem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README
new file mode 100644
index 00000000..e6382130
--- /dev/null
+++ b/src/test/test-crs-dynamic1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with dynamic usage information.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist
new file mode 100644
index 00000000..8684073c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg
new file mode 100644
index 00000000..6a7fbc48
--- /dev/null
+++ b/src/test/test-crs-dynamic1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+    "crs": {
+        "ha": "dynamic"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats
new file mode 100644
index 00000000..922ae9a6
--- /dev/null
+++ b/src/test/test-crs-dynamic1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "cpu": 5.9, "mem": 2744123392 }
+}
diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status
new file mode 100644
index 00000000..bbe44a96
--- /dev/null
+++ b/src/test/test-crs-dynamic1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 }
+}
diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect
new file mode 100644
index 00000000..b7e298e1
--- /dev/null
+++ b/src/test/test-crs-dynamic1/log.expect
@@ -0,0 +1,51 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute network node1 off
+info    120    node1/crm: status change master => lost_manager_lock
+info    120    node1/crm: status change lost_manager_lock => wait_for_quorum
+info    121    node1/lrm: status change active => lost_agent_lock
+info    162     watchdog: execute power node1 off
+info    161    node1/crm: killed by poweroff
+info    162    node1/lrm: killed by poweroff
+info    162     hardware: server 'node1' stopped by poweroff (watchdog)
+info    222    node3/crm: got lock 'ha_manager_lock'
+info    222    node3/crm: status change slave => master
+info    222    node3/crm: using scheduler mode 'dynamic'
+info    222    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info    282    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    282    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai    282    node3/crm: FENCE: Try to fence node 'node1'
+info    282    node3/crm: got lock 'ha_agent_node1_lock'
+info    282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai    282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info    282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    283    node3/lrm: got lock 'ha_agent_node3_lock'
+info    283    node3/lrm: status change wait_for_agent_lock => active
+info    283    node3/lrm: starting service vm:102
+info    283    node3/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config
new file mode 100644
index 00000000..9c124471
--- /dev/null
+++ b/src/test/test-crs-dynamic1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats
new file mode 100644
index 00000000..1819d24c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (31 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

The name is misleading, because the HA resource migration is not
executed, but only queues the HA resource to change into the state
'migrate' or 'relocate', which is then picked up by the respective LRM
to execute.

The term 'resource motion' also generalizes the different actions
implied by the 'migrate' and 'relocate' command and state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes

 src/PVE/HA/Manager.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 6f7b431b..b954092b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -416,7 +416,7 @@ sub read_lrm_status {
     return ($results, $modes);
 }
 
-sub execute_migration {
+sub queue_resource_motion {
     my ($self, $cmd, $task, $sid, $target) = @_;
 
     my ($haenv, $ss) = $self->@{qw(haenv ss)};
@@ -485,7 +485,7 @@ sub update_crm_commands {
                             "ignore crm command - service already on target node: $cmd",
                         );
                     } else {
-                        $self->execute_migration($cmd, $task, $sid, $node);
+                        $self->queue_resource_motion($cmd, $task, $sid, $node);
                     }
                 }
             } else {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (32 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes

 src/PVE/HA/Manager.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b954092b..872d43c4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -91,11 +91,12 @@ sub update_crs_scheduler_mode {
 
     my $haenv = $self->{haenv};
     my $dc_cfg = $haenv->get_datacenter_settings();
+    my $crs_cfg = $dc_cfg->{crs};
 
-    $self->{crs}->{rebalance_on_request_start} = !!$dc_cfg->{crs}->{'ha-rebalance-on-start'};
+    $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
 
     my $old_mode = $self->{crs}->{scheduler};
-    my $new_mode = $dc_cfg->{crs}->{ha} || 'basic';
+    my $new_mode = $crs_cfg->{ha} || 'basic';
 
     if (!defined($old_mode)) {
         $haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 35/40] implement automatic rebalancing
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (33 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

If the automatic load balancing system is enabled, it checks whether the
cluster node imbalance exceeds some user-defined threshold for some HA
Manager rounds ("hold duration"). If it does exceed on consecutive HA
Manager rounds, it will choose the best resource motion to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined imbalance improvement ("margin").

This patch introduces resource bundles, which ensure that HA resources
in strict positive resource affinity rules are considered as a whole
"bundle" instead of individual HA resources.

Specifically, active and stationary resource bundles are resource
bundles, that have at least one resource running and all resources
located on the same node. This distinction is needed as newly created
strict positive resource affinity rules may still require some resource
motions to enforce the rule.

Additionally, the migration candidate generation prunes any target
nodes, which do not adhere to the HA rules of these resource bundles
before scoring these migration candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add more context in patch message
- add comment to sustained_imbalance_round (as suggested by @Thomas)
- fix issue where resource bundle was created even though some dependent
  resources were still migrating or relocating
- remove debug logging of node imbalance
- remove unused calculate_node_loads()
- remove select_best_balancing_migration{,_topsis}() from Static and
  Dynamic and make it a proxy in PVE::HA::Usage

 src/PVE/HA/Manager.pm       | 177 +++++++++++++++++++++++++++++++++++-
 src/PVE/HA/Usage.pm         |  34 +++++++
 src/PVE/HA/Usage/Dynamic.pm |  33 +++++++
 src/PVE/HA/Usage/Static.pm  |  33 +++++++
 4 files changed, 276 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 872d43c4..73146b56 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -59,10 +59,17 @@ sub new {
 
     my $self = bless {
         haenv => $haenv,
-        crs => {},
+        crs => {
+            auto_rebalance => {},
+        },
         last_rules_digest => '',
         last_groups_digest => '',
         last_services_digest => '',
+        # used to track how many HA rounds the imbalance threshold has been exceeded
+        #
+        # this is not persisted for a CRM failover as in the mean time
+        # the usage statistics might have change quite a bit already
+        sustained_imbalance_round => 0,
         group_migration_round => 3, # wait a little bit
     }, $class;
 
@@ -94,6 +101,13 @@ sub update_crs_scheduler_mode {
     my $crs_cfg = $dc_cfg->{crs};
 
     $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
+    $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
+    $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.7;
+    $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
+        // 'bruteforce';
+    $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
+        // 3;
+    $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
 
     my $old_mode = $self->{crs}->{scheduler};
     my $new_mode = $crs_cfg->{ha} || 'basic';
@@ -111,6 +125,150 @@ sub update_crs_scheduler_mode {
     return;
 }
 
+# Returns a hash of lists, which contain the running, non-moving HA resource
+# bundles, which are on the same node, implied by the strict positive resource
+# affinity rules.
+#
+# Each resource bundle has a leader, which is the alphabetically first running
+# HA resource in the resource bundle and also the key of each resource bundle
+# in the returned hash.
+sub get_active_stationary_resource_bundles {
+    my ($ss, $resource_affinity) = @_;
+
+    my $resource_bundles = {};
+OUTER: for my $sid (sort keys %$ss) {
+        # do not consider non-started resource as 'active' leading resource
+        next if $ss->{$sid}->{state} ne 'started';
+
+        my @resources = ($sid);
+        my $nodes = { $ss->{$sid}->{node} => 1 };
+
+        my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
+        if (%$dependent_resources) {
+            for my $csid (keys %$dependent_resources) {
+                next if !defined($ss->{$csid});
+                my ($state, $node) = $ss->{$csid}->@{qw(state node)};
+
+                # do not consider stationary bundle if a dependent resource moves
+                next OUTER if $state eq 'migrate' || $state eq 'relocate';
+                # do not add non-started resource to active bundle
+                next if $state ne 'started';
+
+                $nodes->{$node} = 1;
+
+                push @resources, $csid;
+            }
+
+            @resources = sort @resources;
+        }
+
+        # skip resource bundles, which are not on the same node yet
+        next if keys %$nodes > 1;
+
+        my $leader_sid = $resources[0];
+
+        $resource_bundles->{$leader_sid} = \@resources;
+    }
+
+    return $resource_bundles;
+}
+
+# Returns a hash of hashes, where each item contains the resource bundle's
+# leader, the list of HA resources in the resource bundle, and the list of
+# possible nodes to migrate to.
+sub get_resource_migration_candidates {
+    my ($self) = @_;
+
+    my ($ss, $compiled_rules, $online_node_usage) =
+        $self->@{qw(ss compiled_rules online_node_usage)};
+    my ($node_affinity, $resource_affinity) =
+        $compiled_rules->@{qw(node-affinity resource-affinity)};
+
+    my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    my @compact_migration_candidates = ();
+    for my $leader_sid (sort keys %$resource_bundles) {
+        my $current_leader_node = $ss->{$leader_sid}->{node};
+        my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
+
+        my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+        my ($together, $separate) =
+            get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
+        apply_negative_resource_affinity($separate, $target_nodes);
+
+        delete $target_nodes->{$current_leader_node};
+
+        next if !%$target_nodes;
+
+        push @compact_migration_candidates,
+            {
+                leader => $leader_sid,
+                nodes => [sort keys %$target_nodes],
+                resources => $resource_bundles->{$leader_sid},
+            };
+    }
+
+    return \@compact_migration_candidates;
+}
+
+sub load_balance {
+    my ($self) = @_;
+
+    my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
+    my ($auto_rebalance_opts) = $crs->{auto_rebalance};
+
+    return if !$auto_rebalance_opts->{enable};
+    return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
+    return if $self->any_resource_motion_queued_or_running();
+
+    my ($threshold, $method, $hold_duration, $margin) =
+        $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
+
+    my $imbalance = $online_node_usage->calculate_node_imbalance();
+
+    # do not load balance unless imbalance threshold has been exceeded
+    # consecutively for $hold_duration calls to load_balance()
+    if ($imbalance < $threshold) {
+        $self->{sustained_imbalance_round} = 0;
+        return;
+    } else {
+        $self->{sustained_imbalance_round}++;
+        return if $self->{sustained_imbalance_round} < $hold_duration;
+        $self->{sustained_imbalance_round} = 0;
+    }
+
+    my $candidates = $self->get_resource_migration_candidates();
+
+    my $result;
+    if ($method eq 'bruteforce') {
+        $result = $online_node_usage->select_best_balancing_migration($candidates);
+    } elsif ($method eq 'topsis') {
+        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
+    }
+
+    # happens if $candidates is empty or $method isn't handled above
+    return if !$result;
+
+    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
+
+    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
+    return if $relative_change < $margin;
+
+    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
+
+    my (undef, $type, $id) = $haenv->parse_sid($sid);
+    my $task = $type eq 'vm' ? "migrate" : "relocate";
+    my $cmd = "$task $sid $target";
+
+    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
+    $haenv->log(
+        'info',
+        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
+    );
+
+    $self->queue_resource_motion($cmd, $task, $sid, $target);
+}
+
 sub cleanup {
     my ($self) = @_;
 
@@ -463,6 +621,21 @@ sub queue_resource_motion {
     }
 }
 
+sub any_resource_motion_queued_or_running {
+    my ($self) = @_;
+
+    my ($ss) = $self->@{qw(ss)};
+
+    for my $sid (keys %$ss) {
+        my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
+
+        return 1 if $state eq 'migrate' || $state eq 'relocate';
+        return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
+    }
+
+    return 0;
+}
+
 # read new crm commands and save them into crm master status
 sub update_crm_commands {
     my ($self) = @_;
@@ -746,6 +919,8 @@ sub manage {
 
     $self->update_crm_commands();
 
+    $self->load_balance();
+
     for (;;) {
         my $repeat = 0;
 
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 822b884c..dc029e86 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -59,6 +59,40 @@ sub remove_service_usage {
     die "implement in subclass";
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    die "implement in subclass";
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration_topsis {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
 # Returns a hash with $nodename => $score pairs. A lower $score is better.
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
index 7e11715d..a8adfe83 100644
--- a/src/PVE/HA/Usage/Dynamic.pm
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -92,6 +92,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate dynamic node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 835f4300..92bfaaa7 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -99,6 +99,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate static node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (34 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases document which resource bundles count as active and
stationary and ensure that get_active_stationary_resource_bundles(...)
does produce the correct active, stationary resource bundles.

This is especially important, because these resource bundles are used
for the load balancing candidate generation, which is passed to
score_best_balancing_migration_candidates($candidates, ...). The
PVE::HA::Usage::{Static,Dynamic} implementation validates these
candidates and fails with an user-visible error message.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 src/test/Makefile                 |   1 +
 src/test/test_resource_bundles.pl | 234 ++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+)
 create mode 100755 src/test/test_resource_bundles.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index 6da9e100..f72b755b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -6,6 +6,7 @@ test:
 	@echo "-- start regression tests --"
 	./test_failover1.pl
 	./test_rules_config.pl
+	./test_resource_bundles.pl
 	./ha-tester.pl
 	./test_fence_config.pl
 	@echo "-- end regression tests (success) --"
diff --git a/src/test/test_resource_bundles.pl b/src/test/test_resource_bundles.pl
new file mode 100755
index 00000000..d38dc516
--- /dev/null
+++ b/src/test/test_resource_bundles.pl
@@ -0,0 +1,234 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use Test::More;
+
+use PVE::HA::Manager;
+
+my $get_active_stationary_resource_bundle_tests = [
+    {
+        description => "trivial resource bundles",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {},
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101',
+            ],
+            'vm:102' => [
+                'vm:102',
+            ],
+        },
+    },
+    {
+        description => "simple resource bundle",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:102',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with first resource stopped",
+        services => {
+            'vm:101' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:102' => [
+                'vm:102', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with some stopped resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with moving resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'migrate',
+                node => 'node2',
+                target => 'node1',
+            },
+            'vm:103' => {
+                state => 'relocate',
+                node => 'node3',
+                target => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+    # might happen if the resource bundle is generated even before the HA Manager
+    # puts the HA resources in migrate/relocate to make them adhere to the HA rules
+    {
+        description => "resource bundle with resources on different nodes",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node2',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node3',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+];
+
+my $tests = [
+    @$get_active_stationary_resource_bundle_tests,
+];
+
+plan(tests => scalar($tests->@*));
+
+for my $case ($get_active_stationary_resource_bundle_tests->@*) {
+    my ($ss, $resource_affinity) = $case->@{qw(services resource_affinity)};
+
+    my $result = PVE::HA::Manager::get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    is_deeply($result, $case->{resource_bundles}, $case->{description});
+}
+
+done_testing();
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system test cases
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (35 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the automatic load
rebalancer using the dynamic usage stats.

As an overview:

- Case 0: rebalancing system is inactive for no configured HA resources
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through dynamic
          changes in their usage
- Case 4: rebalancing system doesn't trigger a migration if the node
          imbalance is exceeded once but isn't sustained for at least
          the set hold duration

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 .../test-crs-dynamic-auto-rebalance0/README   |  2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../test-crs-dynamic-auto-rebalance1/README   |  7 ++
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-dynamic-auto-rebalance2/README   |  4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-dynamic-auto-rebalance3/README   |  4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  | 16 ++++
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 80 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 .../test-crs-dynamic-auto-rebalance4/README   | 11 +++
 .../test-crs-dynamic-auto-rebalance4/cmdlist  | 13 +++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 45 files changed, 451 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance0/README b/src/test/test-crs-dynamic-auto-rebalance0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
new file mode 100644
index 00000000..6526c203
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-threshold": 0.7
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/log.expect b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/manager_status b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/service_config b/src/test/test-crs-dynamic-auto-rebalance0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/README b/src/test/test-crs-dynamic-auto-rebalance1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/service_config b/src/test/test-crs-dynamic-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/README b/src/test/test-crs-dynamic-auto-rebalance2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/service_config b/src/test/test-crs-dynamic-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/README b/src/test/test-crs-dynamic-auto-rebalance3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
new file mode 100644
index 00000000..a07fe721
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
@@ -0,0 +1,80 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    260    node1/crm: auto rebalance - migrate vm:103 to node1 (expected target imbalance: 0.4)
+info    260    node1/crm: got crm command: migrate vm:103 node1
+info    260    node1/crm: migrate service 'vm:103' to node 'node1'
+info    260    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    263    node2/lrm: service vm:103 - start migrate to node 'node1'
+info    263    node2/lrm: service vm:103 - end migrate to node 'node1'
+info    280    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    281    node1/lrm: starting service vm:103
+info    281    node1/lrm: service status vm:103 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/service_config b/src/test/test-crs-dynamic-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/README b/src/test/test-crs-dynamic-auto-rebalance4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..14059a3e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/service_config b/src/test/test-crs-dynamic-auto-rebalance4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 38/40] test: add static automatic rebalancing system test cases
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (36 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases are derivatives of the dynamic automatic rebalancing
system test cases 1 to 3, which ensure that the same basic functionality
is provided with the automatic rebalancing system with static usage
information.

The other dynamic usage test cases are not included here, because these
are invariant to the provided usage information and only test further
edge cases.

As an overview:

- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through changes
          in their static usage

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 .../test-crs-static-auto-rebalance1/README    |  7 ++
 .../test-crs-static-auto-rebalance1/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-static-auto-rebalance2/README    |  4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-static-auto-rebalance3/README    |  3 +
 .../test-crs-static-auto-rebalance3/cmdlist   | 15 ++++
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 79 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 24 files changed, 273 insertions(+)
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats

diff --git a/src/test/test-crs-static-auto-rebalance1/README b/src/test/test-crs-static-auto-rebalance1/README
new file mode 100644
index 00000000..8f97ac55
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with static usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-static-auto-rebalance1/cmdlist b/src/test/test-crs-static-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance1/datacenter.cfg b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance1/hardware_status b/src/test/test-crs-static-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/log.expect b/src/test/test-crs-static-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d2c27bec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance1/manager_status b/src/test/test-crs-static-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance1/service_config b/src/test/test-crs-static-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/static_service_stats b/src/test/test-crs-static-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/README b/src/test/test-crs-static-auto-rebalance2/README
new file mode 100644
index 00000000..1d1b9d6e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-static-auto-rebalance2/cmdlist b/src/test/test-crs-static-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance2/datacenter.cfg b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance2/hardware_status b/src/test/test-crs-static-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/log.expect b/src/test/test-crs-static-auto-rebalance2/log.expect
new file mode 100644
index 00000000..3df96d83
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance2/manager_status b/src/test/test-crs-static-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-auto-rebalance2/service_config b/src/test/test-crs-static-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/static_service_stats b/src/test/test-crs-static-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/README b/src/test/test-crs-static-auto-rebalance3/README
new file mode 100644
index 00000000..2f57dac2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/README
@@ -0,0 +1,3 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running HA resources, where the static usage stats of some
+HA resources change over time, to reach minimum cluster node imbalance.
diff --git a/src/test/test-crs-static-auto-rebalance3/cmdlist b/src/test/test-crs-static-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..f18798b0
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/cmdlist
@@ -0,0 +1,15 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:106 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:107 set-static-stats maxcpu 8.0 maxmem 8192"
+    ],
+    [
+        "service vm:101 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:102 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:103 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:104 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:105 set-static-stats maxcpu 1.0 maxmem 1024"
+    ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance3/datacenter.cfg b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance3/hardware_status b/src/test/test-crs-static-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/log.expect b/src/test/test-crs-static-auto-rebalance3/log.expect
new file mode 100644
index 00000000..ddb4e5ec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/log.expect
@@ -0,0 +1,79 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:106 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:107 set-static-stats maxcpu 8.0 maxmem 8192
+info    160    node1/crm: auto rebalance - migrate vm:105 to node1 (expected target imbalance: 0.47)
+info    160    node1/crm: got crm command: migrate vm:105 node1
+info    160    node1/crm: migrate service 'vm:105' to node 'node1'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node1'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node1'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node1)
+info    181    node1/lrm: starting service vm:105
+info    181    node1/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:102 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:103 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:104 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:105 set-static-stats maxcpu 1.0 maxmem 1024
+info    260    node1/crm: auto rebalance - migrate vm:106 to node2 (expected target imbalance: 0.42)
+info    260    node1/crm: got crm command: migrate vm:106 node2
+info    260    node1/crm: migrate service 'vm:106' to node 'node2'
+info    260    node1/crm: service 'vm:106': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    265    node3/lrm: service vm:106 - start migrate to node 'node2'
+info    265    node3/lrm: service vm:106 - end migrate to node 'node2'
+info    280    node1/crm: service 'vm:106': state changed from 'migrate' to 'started'  (node = node2)
+info    283    node2/lrm: starting service vm:106
+info    283    node2/lrm: service status vm:106 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance3/manager_status b/src/test/test-crs-static-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance3/service_config b/src/test/test-crs-static-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/static_service_stats b/src/test/test-crs-static-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..560a6fe8
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:105": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:106": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:107": { "maxcpu": 2.0, "maxmem": 2147483648 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (37 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases are clones of the dynamic automatic rebalancing system
test cases 0 through 4, which ensure that the same basic functionality
is provided with the automatic rebalancing system using the TOPSIS
method.

The expected outputs are exactly the same, but for test case 3, which
changes the second migration from

    vm:103 to node1 with an expected target imbalance of 0.40

to

    vm:103 to node3 with an expected target imbalance of 0.43.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 .../README                                    |  2 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../README                                    |  7 ++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../README                                    |  4 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    |  4 +
 .../cmdlist                                   | 16 ++++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 80 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 .../README                                    | 11 +++
 .../cmdlist                                   | 13 +++
 .../datacenter.cfg                            |  9 +++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 45 files changed, 455 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/README b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/README b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/README b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/README b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
new file mode 100644
index 00000000..4aaddd39
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
@@ -0,0 +1,80 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    260    node1/crm: auto rebalance - migrate vm:103 to node3 (expected target imbalance: 0.43)
+info    260    node1/crm: got crm command: migrate vm:103 node3
+info    260    node1/crm: migrate service 'vm:103' to node 'node3'
+info    260    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    263    node2/lrm: service vm:103 - start migrate to node 'node3'
+info    263    node2/lrm: service vm:103 - end migrate to node 'node3'
+info    280    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    285    node3/lrm: starting service vm:103
+info    285    node3/lrm: service status vm:103 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/README b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
new file mode 100644
index 00000000..0fb3fdc3
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis",
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules
  2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (38 preceding siblings ...)
  2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
  39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
  To: pve-devel

These test cases document and verify some behaviors of the automatic
rebalancing system in combination with HA affinity rules.

All of these test cases use only the dynamic usage information and
bruteforce method as the waiting on ongoing migrations and candidate
generation are invariant to those parameters.

As an overview:

- Case 1: rebalancing system acknowledges node affinity rules
- Case 2: rebalancing system considers HA resources in strict positive
          resource affinity rules as a single unit (a resource bundle)
          and will not split them apart
- Case 3: rebalancing system will wait on the migration of a not-yet
          enforced strict positive resource affinity rule, i.e., the
          HA resources still need to migrate to their common node
- Case 4: rebalancing system will acknowledge strict negative resource
          affinity rules, but will still try to minimize the node
          imbalance as much as possible

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!

 .../README                                    |  7 +++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 49 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  5 ++
 .../static_service_stats                      |  5 ++
 .../README                                    | 12 ++++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 53 +++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../static_service_stats                      |  4 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            | 31 ++++++++++
 .../rules_config                              |  3 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 +++
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 40 files changed, 452 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/README b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
new file mode 100644
index 00000000..8504755f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information will not
+auto rebalance running HA resources, which cause a node imbalance exceeding the
+threshold, because their HA node affinity rules require them to strictly be
+kept on specific nodes.
+
+As a sanity check, the added HA resource, which is not part of the node
+affinity rule, is rebalanced to another node to lower the imbalance.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..6ee04948
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:104 add node1 started 1",
+        "service vm:104 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:104 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..02133ab0
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d0b2aee2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
@@ -0,0 +1,49 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:104 add node1 started 1
+info    120      cmdlist: execute service vm:104 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:104 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:104' on node 'node1'
+info    120    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.98)
+info    140    node1/crm: got crm command: migrate vm:104 node2
+info    140    node1/crm: migrate service 'vm:104' to node 'node2'
+info    140    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:104 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:104 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:104
+info    163    node2/lrm: service status vm:104 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
new file mode 100644
index 00000000..00f615e9
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-stays-on-node1
+	nodes node1
+	resources vm:101,vm:102,vm:103
+	strict 1
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
new file mode 100644
index 00000000..57e3579d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..b11cc5eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/README b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
new file mode 100644
index 00000000..be072f6d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
@@ -0,0 +1,12 @@
+Test that the auto rebalance system with dynamic usage information will
+consider running HA resources in strict positive resource affinity rules as
+bundles, which can only be moved to other nodes as a single unit.
+
+Therefore, even though the two initial HA resources would be split apart,
+because these cause a node imbalance in the cluster, the auto rebalance system
+does not issue a rebalancing migration, because they must stay together.
+
+As a sanity check, adding another HA resource, which is not part of the strict
+positive resource affinity rule, will cause a rebalancing migration: in this
+case the resource bundle itself, because the leading node 'vm:101' is
+alphabetically first.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..61373367
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:103 add node1 started 1",
+        "service vm:103 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:103 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..4f81dfe2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
new file mode 100644
index 00000000..48501321
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
@@ -0,0 +1,53 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:103 add node1 started 1
+info    120      cmdlist: execute service vm:103 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:103 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:103' on node 'node1'
+info    120    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.86)
+info    140    node1/crm: got crm command: migrate vm:101 node2
+info    140    node1/crm: crm command 'migrate vm:101 node2' - migrate service 'vm:102' to node 'node2' (service 'vm:102' in positive affinity with service 'vm:101')
+info    140    node1/crm: migrate service 'vm:101' to node 'node2'
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    140    node1/crm: migrate service 'vm:102' to node 'node2'
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:101 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:101 - end migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:101
+info    163    node2/lrm: service status vm:101 started
+info    163    node2/lrm: starting service vm:102
+info    163    node2/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
new file mode 100644
index 00000000..e1948a00
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:102
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
new file mode 100644
index 00000000..880e0a59
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..455ae043
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/README b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
new file mode 100644
index 00000000..4b4d4855
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will wait on
+a resource motion being finished, because a strict positive resource affinity
+rule is not correctly enforced yet.
+
+This test case manipulates the manager status in such a way, so that the HA
+Manager will assume that the not-yet-migrated HA resource in the strict
+positive resource affinity rule is still migrating as currently the integration
+tests do not support prolonged migrations.
+
+Furthermore, auto rebalancing migrations are forced to be issued as soon as
+possible with the hold duration being set to 0. This ensures that if the auto
+rebalance system would not wait on the ongoing migration, the auto rebalancing
+migration would be done right away in the same round as the HA resources being
+acknowledged as running.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..181ea848
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 0
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..d35a2c8f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 },
+    "vm:104": { "cpu": 4.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
new file mode 100644
index 00000000..1242f827
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: service vm:101 - start migrate to node 'node1'
+info     23    node2/lrm: service vm:101 - end migrate to node 'node1'
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     60    node1/crm: auto rebalance - migrate vm:102 to node2 (expected target imbalance: 0.72)
+info     60    node1/crm: got crm command: migrate vm:102 node2
+info     60    node1/crm: migrate service 'vm:102' to node 'node2'
+info     60    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     61    node1/lrm: service vm:102 - start migrate to node 'node2'
+info     61    node1/lrm: service vm:102 - end migrate to node 'node2'
+info     80    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info     83    node2/lrm: starting service vm:102
+info     83    node2/lrm: service status vm:102 started
+info    100    node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.27)
+info    100    node1/crm: got crm command: migrate vm:101 node3
+info    100    node1/crm: crm command 'migrate vm:101 node3' - migrate service 'vm:103' to node 'node3' (service 'vm:103' in positive affinity with service 'vm:101')
+info    100    node1/crm: migrate service 'vm:101' to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    100    node1/crm: migrate service 'vm:103' to node 'node3'
+info    100    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    101    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - end migrate to node 'node3'
+info    105    node3/lrm: got lock 'ha_agent_node3_lock'
+info    105    node3/lrm: status change wait_for_agent_lock => active
+info    120    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    120    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    125    node3/lrm: starting service vm:101
+info    125    node3/lrm: service status vm:101 started
+info    125    node3/lrm: starting service vm:103
+info    125    node3/lrm: service status vm:103 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
new file mode 100644
index 00000000..cf90037c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
@@ -0,0 +1,31 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"online",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node2",
+	    "state": "migrate",
+	    "target": "node1",
+	    "uid": "RoPGTlvNYq/oZFokv9fgWw"
+	},
+        "vm:102": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "fR3i18EHk6DhF8Zd2jddNX"
+        },
+	"vm:103": {
+	    "node": "node1",
+	    "state": "started",
+	    "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+	},
+        "vm:104": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "23hk23EHk6DhF8Zd0218DD"
+        }
+    }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
new file mode 100644
index 00000000..2c3f3171
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:103
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
new file mode 100644
index 00000000..3dadaabc
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/README b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
new file mode 100644
index 00000000..e304cc22
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will not
+rebalance a HA resource on the same node as another HA resource, which are in a
+strict negative resource affinity rule.
+
+There is a high node imbalance since vm:101 and vm:102 on node1 cause a higher
+usage than node2 and node3 have. Even though it would be ideal to move one of
+these to node2, because it has a very low usage, these cannot be moved there as
+both vm:101 and vm:102 are in a strict negative resource affinity rule with a
+HA resource on node2 respectively.
+
+To minimize the imbalance in the cluster, one of the HA resources from node1 is
+migrated to node3 first, and afterwards the HA resource on node3, which is not
+in a strict negative resource affinity rule with a HA resource on node2, will
+be migrated to node2.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..083f338b
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 4294967296 },
+    "vm:102": { "cpu": 2.4, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.0, "mem": 0 },
+    "vm:104": { "cpu": 1.0, "mem": 1073741824 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
new file mode 100644
index 00000000..58f1b481
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:104
+info     25    node3/lrm: service status vm:104 started
+info     80    node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.72)
+info     80    node1/crm: got crm command: migrate vm:101 node3
+info     80    node1/crm: migrate service 'vm:101' to node 'node3'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node3'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    105    node3/lrm: starting service vm:101
+info    105    node3/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.33)
+info    160    node1/crm: got crm command: migrate vm:104 node2
+info    160    node1/crm: migrate service 'vm:104' to node 'node2'
+info    160    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:104 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:104 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:104
+info    183    node2/lrm: service status vm:104 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
new file mode 100644
index 00000000..eef5460f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
@@ -0,0 +1,7 @@
+resource-affinity: vms-stay-apart1
+	resources vm:101,vm:103
+	affinity negative
+
+resource-affinity: vms-stay-apart2
+	resources vm:102,vm:103
+	affinity negative
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
new file mode 100644
index 00000000..16bffacf
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2026-03-27 14:18 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal