[PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer
@ 2026-03-30 14:30 Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
                   ` (41 more replies)
  0 siblings, 42 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Most of the patches for the v3 are already R-b'd by @Dominik (many
thanks!). A lot less has changed than from v1 -> v2, I've still added
per-patch changelogs to make reviewing the rest more straightforward.

Most changes went into

- #05 - resource-scheduling: implement generic cluster usage
        implementation
- #09 - resource-scheduling: implement rebalancing migration selection
- #13 - pve-rs: resource-scheduling: use generic usage implementation
- #19 - datacenter config: add auto rebalancing options
- #30 - usage: use add_service to add service usage to nodes

and the last few ha-manager patches should still get a review before
applying is considered. I'll follow-up with the changes for pve-docs and
pve-manager as soon as these patches are in a final state.

Otherwise, the patches have been tested with building per-patch
build-tests, I'll do some more testing with some Proxmox VE setups
tomorrow and give an update here.

Thank you very much for the feedback, @Dominik, @Thomas, @Maximiliano,
and @Jillian Morgan!



fixes v2 -> v3:
- correctly set running state for HA resources in usage accounting
- fix an error in Usage::add_resource_usage_to_node(), which correctly
  handles adding a non-existent node (would still move existing resource
  into moving state) and add a test case for that

changes v2 -> v3:
- rebased all repositories on master
- move unset current_node invariant handling from proxmox-perl-rs to
  pve-ha-manager
- made logic in (proxmox-resource-scheduling) Usage::add_resource() and
  Usage::add_resource_usage_to_node() more explicit by inling the logic
- drop Usage::add_resource_to_nodes()
- drop Usage::remove_resource_from_nodes()
- drop Resource::moving_to() method
- drop Resource::nodenames() and ResourcePlacement::nodenames() methods
- s/to_string/to_owned/ where sensible
- do not use bail!() but use assert!()s instead
- do not compare imbalance score in scheduler tests, but only the order
  of the migrations returned by
  score_best_balancing_migration_candidates{,_topsis}()



This patch series proposes an implementation for a dynamic scheduler and
manual/automatic static/dynamic load rebalancer by implementing the
following:

- gather dynamic node and service usage information and use it in the
  dynamic scheduler, and

- implement a load rebalancer, which actively moves HA resources to
  other nodes, to lower the overall cluster node imbalance, while
  adhering to the HA rules.



== Model ==

The automatic load rebalancing system checks whether the cluster node
imbalance exceeds some user-defined threshold for some HA Manager rounds
("hold duration"). If it does exceed on consecutive HA Manager rounds,
it will choose the best service migration/relocation to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined improvement ("margin").

The best service motion can be selected by either bruteforce or TOPSIS.
This selection method and some other parameters from above can be
tweaked at runtime.



== Tests ==

I've added some test cases to ensure more basic decisions are
documented. The other tests were in virtualized clusters with adding
load dynamically to guests with stress-ng, even though I plan to rely
more on real-world load simulators for the next batch of tests.



== Benchmarks ==

I've also done some theoretical benchmarks with the target of being able
to handle a 48 nodes cluster and 9.999 HA resources / guests and a
worst-case scenario of each HA resource being part of 3 HA rules
(pairwise positive and negative resource affinity rules, where each
positive resource affinity pair has a common node affinity rule).

Generating the migration candidates for the huge cluster with the
worst-case HA ruleset takes 243 +- 9 ms.

Generating the migration candidates for the huge cluster without the
worst-case HA ruleset (to gain the most amount of 459954 migration
candidates) takes 356 +- 6 ms. This is expected, because we need to
evaluate more HA resources' rules as there are no HA resource bundles.

Excluding the generation, the brute force and TOPSIS method for
select_best_balancing_migration() were roughly similar both being in the
range 350 +- 50 ms for the huge cluster without any HA rules (for the
maximum amount of migration candidates) including the serialization
between Perl and Rust.



== Future ideas ==

- include the migration costs in score_best_balancing_migrations(),
  e.g., so that VMs with lots of memory are less likely to be migrated
  if the link between the nodes is slow, but that would need measuring
  and storing the migration network link speeds as a mesh

- apply some filter like moving average window or exponential smoothing
  on the usage time series to dampen spikes; triple exponential
  smoothing (Holts-Winters) is also already implemented in rrdcached and
  allows for exponential smoothing with better time series analysis but
  would require changing the rrdcached data structure once more

- score_best_balancing_migrations(...) can already provide a
  size-limited list of the best migrations, which could be exposed to
  users to allow manual load balancing actions, e.g., from the web
  interface, to get some insight in the system

- The current scheduler can only solve bin covering, but it would be
  interesting to also allow bin packing if certain criteria are met,
  e.g., for energy preservation while the overall cluster load is low

- Allow individual HA resources to be actively excluded from the
  automatic rebalancing, e.g., because containers cannot be live
  migrated.

- move the migration candidate generation to the rust-side; the
  generation on the perl-side was chosen first to reduce code
  duplication, but it doesn't seem future proof and right to copy state
  to the online_node_usage object twice (medium priority)



== Diffstat ==


proxmox:

Daniel Kral (9):
  resource-scheduling: inline add_cpu_usage in
    score_nodes_to_start_service
  resource-scheduling: move score_nodes_to_start_service to scheduler
    crate
  resource-scheduling: rename service to resource where appropriate
  resource-scheduling: introduce generic scheduler implementation
  resource-scheduling: implement generic cluster usage implementation
  resource-scheduling: topsis: handle empty criteria without panics
  resource-scheduling: compare by nodename in
    score_nodes_to_start_resource
  resource-scheduling: factor out topsis alternative mapping
  resource-scheduling: implement rebalancing migration selection

 proxmox-resource-scheduling/src/lib.rs        |   9 +
 proxmox-resource-scheduling/src/node.rs       |  96 +++++
 proxmox-resource-scheduling/src/pve_static.rs | 102 ++---
 proxmox-resource-scheduling/src/resource.rs   | 117 +++++
 proxmox-resource-scheduling/src/scheduler.rs  | 407 ++++++++++++++++++
 proxmox-resource-scheduling/src/topsis.rs     |   6 +-
 proxmox-resource-scheduling/src/usage.rs      | 208 +++++++++
 .../tests/scheduler.rs                        | 379 ++++++++++++++++
 proxmox-resource-scheduling/tests/usage.rs    | 181 ++++++++
 9 files changed, 1437 insertions(+), 68 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/node.rs
 create mode 100644 proxmox-resource-scheduling/src/resource.rs
 create mode 100644 proxmox-resource-scheduling/src/scheduler.rs
 create mode 100644 proxmox-resource-scheduling/src/usage.rs
 create mode 100644 proxmox-resource-scheduling/tests/scheduler.rs
 create mode 100644 proxmox-resource-scheduling/tests/usage.rs


perl-rs:

Daniel Kral (7):
  pve-rs: resource-scheduling: remove pedantic error handling from
    remove_node
  pve-rs: resource-scheduling: remove pedantic error handling from
    remove_service_usage
  pve-rs: resource-scheduling: move pve_static into resource_scheduling
    module
  pve-rs: resource-scheduling: use generic usage implementation
  pve-rs: resource-scheduling: static: replace deprecated usage structs
  pve-rs: resource-scheduling: implement pve_dynamic bindings
  pve-rs: resource-scheduling: expose auto rebalancing methods

 pve-rs/Makefile                               |   1 +
 pve-rs/src/bindings/mod.rs                    |   3 +-
 .../src/bindings/resource_scheduling/mod.rs   |  10 +
 .../resource_scheduling/pve_dynamic.rs        | 227 ++++++++++++++++++
 .../resource_scheduling/pve_static.rs         | 225 +++++++++++++++++
 .../bindings/resource_scheduling/resource.rs  | 126 ++++++++++
 .../src/bindings/resource_scheduling/usage.rs |  81 +++++++
 .../bindings/resource_scheduling_static.rs    | 215 -----------------
 pve-rs/test/resource_scheduling.pl            |   1 +
 9 files changed, 672 insertions(+), 217 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/mod.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_static.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/resource.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/usage.rs
 delete mode 100644 pve-rs/src/bindings/resource_scheduling_static.rs


cluster:

Daniel Kral (3):
  datacenter config: restructure verbose description for the ha crs
    option
  datacenter config: add dynamic load scheduler option
  datacenter config: add auto rebalancing options

 src/PVE/DataCenterConfig.pm | 53 ++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 4 deletions(-)


ha-manager:

Daniel Kral (15):
  env: pve2: implement dynamic node and service stats
  usage: pass service data to add_service_usage
  usage: pass service data to get_used_service_nodes
  add running flag to non-HA cluster service stats
  usage: use add_service to add service usage to nodes
  usage: add dynamic usage scheduler
  test: add dynamic usage scheduler test cases
  manager: rename execute_migration to queue_resource_motion
  manager: update_crs_scheduler_mode: factor out crs config
  implement automatic rebalancing
  test: add resource bundle generation test cases
  test: add dynamic automatic rebalancing system test cases
  test: add static automatic rebalancing system test cases
  test: add automatic rebalancing system test cases with TOPSIS method
  test: add automatic rebalancing system test cases with affinity rules

Dominik Rusovac (6):
  sim: hardware: pass correct types for static stats
  sim: hardware: factor out static stats' default values
  sim: hardware: fix static stats guard
  sim: hardware: handle dynamic service stats
  sim: hardware: add set-dynamic-stats command
  sim: hardware: add getters for dynamic {node,service} stats

 debian/pve-ha-manager.install                 |   1 +
 src/PVE/HA/Env.pm                             |  12 +
 src/PVE/HA/Env/PVE2.pm                        |  64 +++++
 src/PVE/HA/Manager.pm                         | 220 +++++++++++++++-
 src/PVE/HA/Rules/ResourceAffinity.pm          |   3 +-
 src/PVE/HA/Sim/Env.pm                         |  12 +
 src/PVE/HA/Sim/Hardware.pm                    | 185 ++++++++++++--
 src/PVE/HA/Sim/RTHardware.pm                  |   4 +-
 src/PVE/HA/Usage.pm                           |  64 +++--
 src/PVE/HA/Usage/Basic.pm                     |   9 +-
 src/PVE/HA/Usage/Dynamic.pm                   | 155 ++++++++++++
 src/PVE/HA/Usage/Makefile                     |   2 +-
 src/PVE/HA/Usage/Static.pm                    |  63 ++++-
 src/test/Makefile                             |   1 +
 .../README                                    |   2 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../README                                    |   7 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../README                                    |   4 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |   4 +
 .../cmdlist                                   |  16 ++
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  80 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |  11 +
 .../cmdlist                                   |  13 +
 .../datacenter.cfg                            |   9 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance0/README   |   2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../test-crs-dynamic-auto-rebalance1/README   |   7 +
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-dynamic-auto-rebalance2/README   |   4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-dynamic-auto-rebalance3/README   |   4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  |  16 ++
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  80 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance4/README   |  11 +
 .../test-crs-dynamic-auto-rebalance4/cmdlist  |  13 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |   7 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   5 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  49 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |   5 +
 .../static_service_stats                      |   5 +
 .../README                                    |  12 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  53 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   3 +
 .../service_config                            |   4 +
 .../static_service_stats                      |   4 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |  31 +++
 .../rules_config                              |   3 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../rules_config                              |   7 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 src/test/test-crs-dynamic-rebalance1/README   |   3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |   4 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   7 +
 .../hardware_status                           |   5 +
 .../test-crs-dynamic-rebalance1/log.expect    |  82 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   7 +
 .../static_service_stats                      |   7 +
 src/test/test-crs-dynamic1/README             |   4 +
 src/test/test-crs-dynamic1/cmdlist            |   4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |   6 +
 .../test-crs-dynamic1/dynamic_service_stats   |   3 +
 src/test/test-crs-dynamic1/hardware_status    |   5 +
 src/test/test-crs-dynamic1/log.expect         |  51 ++++
 src/test/test-crs-dynamic1/manager_status     |   1 +
 src/test/test-crs-dynamic1/service_config     |   3 +
 .../test-crs-dynamic1/static_service_stats    |   3 +
 .../test-crs-static-auto-rebalance1/README    |   7 +
 .../test-crs-static-auto-rebalance1/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-static-auto-rebalance2/README    |   4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-static-auto-rebalance3/README    |   3 +
 .../test-crs-static-auto-rebalance3/cmdlist   |  15 ++
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  79 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 src/test/test_resource_bundles.pl             | 234 ++++++++++++++++++
 187 files changed, 2813 insertions(+), 50 deletions(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats
 create mode 100755 src/test/test_resource_bundles.pl


Summary over all repositories:
  206 files changed, 4971 insertions(+), 339 deletions(-)

-- 
Generated by murpp 0.11.0




^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  6:01   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
                   ` (40 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

This makes moving the function out into its own module easier to follow,
which in turn is needed to generalize score_nodes_to_start_service(...)
for other usage stats in the following patches.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/pve_static.rs | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index b81086dd..fd5e5ffc 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -94,7 +94,11 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
             for (index, node) in nodes.iter().enumerate() {
                 let node = node.as_ref();
                 let new_cpu = if index == target_index {
-                    add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
+                    if service.maxcpu == 0.0 {
+                        node.cpu + node.maxcpu as f64
+                    } else {
+                        node.cpu + service.maxcpu
+                    }
                 } else {
                     node.cpu
                 } / (node.maxcpu as f64);
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  6:01   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
                   ` (39 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

This is done so score_nodes_to_start_service(...) can be generalized in
the following patches, so other usage stat structs can reuse the same
scoring method.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/lib.rs        |  2 +
 proxmox-resource-scheduling/src/pve_static.rs | 76 +---------------
 proxmox-resource-scheduling/src/scheduler.rs  | 90 +++++++++++++++++++
 3 files changed, 94 insertions(+), 74 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/scheduler.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 47980259..c73e7b1e 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,4 +1,6 @@
 #[macro_use]
 pub mod topsis;
 
+pub mod scheduler;
+
 pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index fd5e5ffc..5df0be37 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,7 +1,7 @@
 use anyhow::Error;
 use serde::{Deserialize, Serialize};
 
-use crate::topsis;
+use crate::scheduler;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -53,23 +53,6 @@ pub struct StaticServiceUsage {
     pub maxmem: usize,
 }
 
-criteria_struct! {
-    /// A given alternative.
-    struct PveTopsisAlternative {
-        #[criterion("average CPU", -1.0)]
-        average_cpu: f64,
-        #[criterion("highest CPU", -2.0)]
-        highest_cpu: f64,
-        #[criterion("average memory", -5.0)]
-        average_memory: f64,
-        #[criterion("highest memory", -10.0)]
-        highest_memory: f64,
-    }
-
-    const N_CRITERIA;
-    static PVE_HA_TOPSIS_CRITERIA;
-}
-
 /// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
 /// and CPU usages of the nodes as if the service would already be running on each.
 ///
@@ -79,60 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    let len = nodes.len();
-
-    let matrix = nodes
-        .iter()
-        .enumerate()
-        .map(|(target_index, _)| {
-            // Base values on percentages to allow comparing nodes with different stats.
-            let mut highest_cpu = 0.0;
-            let mut squares_cpu = 0.0;
-            let mut highest_mem = 0.0;
-            let mut squares_mem = 0.0;
-
-            for (index, node) in nodes.iter().enumerate() {
-                let node = node.as_ref();
-                let new_cpu = if index == target_index {
-                    if service.maxcpu == 0.0 {
-                        node.cpu + node.maxcpu as f64
-                    } else {
-                        node.cpu + service.maxcpu
-                    }
-                } else {
-                    node.cpu
-                } / (node.maxcpu as f64);
-                highest_cpu = f64::max(highest_cpu, new_cpu);
-                squares_cpu += new_cpu.powi(2);
-
-                let new_mem = if index == target_index {
-                    node.mem + service.maxmem
-                } else {
-                    node.mem
-                } as f64
-                    / node.maxmem as f64;
-                highest_mem = f64::max(highest_mem, new_mem);
-                squares_mem += new_mem.powi(2);
-            }
-
-            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-            // 1.004 is only slightly more than 1.002.
-            PveTopsisAlternative {
-                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                highest_cpu: 1.0 + highest_cpu,
-                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                highest_memory: 1.0 + highest_mem,
-            }
-            .into()
-        })
-        .collect::<Vec<_>>();
-
-    let scores =
-        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
-    Ok(scores
-        .into_iter()
-        .enumerate()
-        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
-        .collect())
+    scheduler::score_nodes_to_start_service(nodes, service)
 }
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
new file mode 100644
index 00000000..385015e3
--- /dev/null
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -0,0 +1,90 @@
+use anyhow::Error;
+
+use crate::{
+    pve_static::{StaticNodeUsage, StaticServiceUsage},
+    topsis,
+};
+
+criteria_struct! {
+    /// A given alternative.
+    struct PveTopsisAlternative {
+        #[criterion("average CPU", -1.0)]
+        average_cpu: f64,
+        #[criterion("highest CPU", -2.0)]
+        highest_cpu: f64,
+        #[criterion("average memory", -5.0)]
+        average_memory: f64,
+        #[criterion("highest memory", -10.0)]
+        highest_memory: f64,
+    }
+
+    const N_CRITERIA;
+    static PVE_HA_TOPSIS_CRITERIA;
+}
+
+/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the service would already be running on each.
+///
+/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
+/// is better.
+pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+    nodes: &[T],
+    service: &StaticServiceUsage,
+) -> Result<Vec<(String, f64)>, Error> {
+    let len = nodes.len();
+
+    let matrix = nodes
+        .iter()
+        .enumerate()
+        .map(|(target_index, _)| {
+            // Base values on percentages to allow comparing nodes with different stats.
+            let mut highest_cpu = 0.0;
+            let mut squares_cpu = 0.0;
+            let mut highest_mem = 0.0;
+            let mut squares_mem = 0.0;
+
+            for (index, node) in nodes.iter().enumerate() {
+                let node = node.as_ref();
+                let new_cpu = if index == target_index {
+                    if service.maxcpu == 0.0 {
+                        node.cpu + node.maxcpu as f64
+                    } else {
+                        node.cpu + service.maxcpu
+                    }
+                } else {
+                    node.cpu
+                } / (node.maxcpu as f64);
+                highest_cpu = f64::max(highest_cpu, new_cpu);
+                squares_cpu += new_cpu.powi(2);
+
+                let new_mem = if index == target_index {
+                    node.mem + service.maxmem
+                } else {
+                    node.mem
+                } as f64
+                    / node.maxmem as f64;
+                highest_mem = f64::max(highest_mem, new_mem);
+                squares_mem += new_mem.powi(2);
+            }
+
+            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+            // 1.004 is only slightly more than 1.002.
+            PveTopsisAlternative {
+                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+                highest_cpu: 1.0 + highest_cpu,
+                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+                highest_memory: 1.0 + highest_mem,
+            }
+            .into()
+        })
+        .collect::<Vec<_>>();
+
+    let scores =
+        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+    Ok(scores
+        .into_iter()
+        .enumerate()
+        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
+        .collect())
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  6:02   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
                   ` (38 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The term `resource` is more appropriate with respect to the crate name
and also the preferred name for the current main application in the HA
Manager.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/pve_static.rs |  2 +-
 proxmox-resource-scheduling/src/scheduler.rs  | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index 5df0be37..c7e1d1b1 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -62,5 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    scheduler::score_nodes_to_start_service(nodes, service)
+    scheduler::score_nodes_to_start_resource(nodes, service)
 }
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 385015e3..39ee44ce 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -22,14 +22,14 @@ criteria_struct! {
     static PVE_HA_TOPSIS_CRITERIA;
 }
 
-/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the service would already be running on each.
+/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the resource would already be running on each.
 ///
 /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
 /// is better.
-pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
-    service: &StaticServiceUsage,
+    resource: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
     let len = nodes.len();
 
@@ -46,10 +46,10 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
             for (index, node) in nodes.iter().enumerate() {
                 let node = node.as_ref();
                 let new_cpu = if index == target_index {
-                    if service.maxcpu == 0.0 {
+                    if resource.maxcpu == 0.0 {
                         node.cpu + node.maxcpu as f64
                     } else {
-                        node.cpu + service.maxcpu
+                        node.cpu + resource.maxcpu
                     }
                 } else {
                     node.cpu
@@ -58,7 +58,7 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
                 squares_cpu += new_cpu.powi(2);
 
                 let new_mem = if index == target_index {
-                    node.mem + service.maxmem
+                    node.mem + resource.maxmem
                 } else {
                     node.mem
                 } as f64
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (2 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  6:11   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
                   ` (37 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The existing score_nodes_to_start_resource(...) function is dependent on
the StaticNodeUsage and StaticServiceUsage structs.

To use this function for other usage stats structs as well, declare
generic NodeStats and ResourceStats structs, that the users can convert
into. These are used to make score_nodes_to_start_resource(...) and its
documentation generic.

The pve_static::score_nodes_to_start_service(...) is marked as
deprecated accordingly. The usage-related structs are marked as
deprecated as well as the specific usage implementations - including
their serialization and deserialization - should be handled by the
caller now.

This is best viewed with the git option --ignore-all-space.

No functional changes intended.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- add Clone and Debug derives to Scheduler struct
- change second unlimited_cpu_resource_stats variable to
  combined_resource_stats; was a Copy-Paste error

 proxmox-resource-scheduling/src/lib.rs        |   6 +
 proxmox-resource-scheduling/src/node.rs       |  39 ++++
 proxmox-resource-scheduling/src/pve_static.rs |  46 +++-
 proxmox-resource-scheduling/src/resource.rs   |  33 +++
 proxmox-resource-scheduling/src/scheduler.rs  | 158 ++++++++------
 .../tests/scheduler.rs                        | 200 ++++++++++++++++++
 6 files changed, 409 insertions(+), 73 deletions(-)
 create mode 100644 proxmox-resource-scheduling/src/node.rs
 create mode 100644 proxmox-resource-scheduling/src/resource.rs
 create mode 100644 proxmox-resource-scheduling/tests/scheduler.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index c73e7b1e..12b743fe 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,6 +1,12 @@
 #[macro_use]
 pub mod topsis;
 
+pub mod node;
+pub mod resource;
+
 pub mod scheduler;
 
+// pve_static exists only for backwards compatibility to not break builds
+// The allow(deprecated) is to not report its own use of deprecated items
+#[allow(deprecated)]
 pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
new file mode 100644
index 00000000..e6227eda
--- /dev/null
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -0,0 +1,39 @@
+use crate::resource::ResourceStats;
+
+/// Usage statistics of a node.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct NodeStats {
+    /// CPU utilization in CPU cores.
+    pub cpu: f64,
+    /// Total number of CPU cores.
+    pub maxcpu: usize,
+    /// Used memory in bytes.
+    pub mem: usize,
+    /// Total memory in bytes.
+    pub maxmem: usize,
+}
+
+impl NodeStats {
+    /// Adds the resource stats to the node stats as if the resource has started on the node.
+    pub fn add_started_resource(&mut self, resource_stats: &ResourceStats) {
+        // a maxcpu value of `0.0` means no cpu usage limit on the node
+        let resource_cpu = if resource_stats.maxcpu == 0.0 {
+            self.maxcpu as f64
+        } else {
+            resource_stats.maxcpu
+        };
+
+        self.cpu += resource_cpu;
+        self.mem += resource_stats.maxmem;
+    }
+
+    /// Returns the current cpu usage as a percentage.
+    pub fn cpu_load(&self) -> f64 {
+        self.cpu / self.maxcpu as f64
+    }
+
+    /// Returns the current memory usage as a percentage.
+    pub fn mem_load(&self) -> f64 {
+        self.mem as f64 / self.maxmem as f64
+    }
+}
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index c7e1d1b1..229ee3c6 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,10 +1,12 @@
 use anyhow::Error;
 use serde::{Deserialize, Serialize};
 
-use crate::scheduler;
+use crate::scheduler::{NodeUsage, Scheduler};
+use crate::{node::NodeStats, resource::ResourceStats};
 
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
+#[deprecated = "specific node usage structs should be declared where they are used"]
 /// Static usage information of a node.
 pub struct StaticNodeUsage {
     /// Hostname of the node.
@@ -33,6 +35,22 @@ impl AsRef<StaticNodeUsage> for StaticNodeUsage {
     }
 }
 
+impl From<StaticNodeUsage> for NodeUsage {
+    fn from(usage: StaticNodeUsage) -> Self {
+        let stats = NodeStats {
+            cpu: usage.cpu,
+            maxcpu: usage.maxcpu,
+            mem: usage.mem,
+            maxmem: usage.maxmem,
+        };
+
+        Self {
+            name: usage.name,
+            stats,
+        }
+    }
+}
+
 /// Calculate new CPU usage in percent.
 /// `add` being `0.0` means "unlimited" and results in `max` being added.
 fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
@@ -43,8 +61,9 @@ fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
     }
 }
 
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Copy, Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
+#[deprecated = "specific service usage structs should be declared where they are used"]
 /// Static usage information of an HA resource.
 pub struct StaticServiceUsage {
     /// Number of assigned CPUs or CPU limit.
@@ -53,14 +72,33 @@ pub struct StaticServiceUsage {
     pub maxmem: usize,
 }
 
+impl From<StaticServiceUsage> for ResourceStats {
+    fn from(usage: StaticServiceUsage) -> Self {
+        Self {
+            cpu: usage.maxcpu,
+            maxcpu: usage.maxcpu,
+            mem: usage.maxmem,
+            maxmem: usage.maxmem,
+        }
+    }
+}
+
 /// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
 /// and CPU usages of the nodes as if the service would already be running on each.
 ///
 /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
 /// is better.
+#[deprecated = "use Scheduler::score_nodes_to_start_resource(...) directly instead"]
 pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
     nodes: &[T],
     service: &StaticServiceUsage,
 ) -> Result<Vec<(String, f64)>, Error> {
-    scheduler::score_nodes_to_start_resource(nodes, service)
+    let nodes = nodes
+        .iter()
+        .map(|node| node.as_ref().clone().into())
+        .collect::<Vec<NodeUsage>>();
+
+    let scheduler = Scheduler::from_nodes(nodes);
+
+    scheduler.score_nodes_to_start_resource(*service)
 }
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
new file mode 100644
index 00000000..1eb9d15e
--- /dev/null
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -0,0 +1,33 @@
+use std::{iter::Sum, ops::Add};
+
+/// Usage statistics for a resource.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct ResourceStats {
+    /// CPU utilization in CPU cores.
+    pub cpu: f64,
+    /// Number of assigned CPUs or CPU limit.
+    pub maxcpu: f64,
+    /// Used memory in bytes.
+    pub mem: usize,
+    /// Maximum assigned memory in bytes.
+    pub maxmem: usize,
+}
+
+impl Add for ResourceStats {
+    type Output = Self;
+
+    fn add(self, other: Self) -> Self {
+        Self {
+            cpu: self.cpu + other.cpu,
+            maxcpu: self.maxcpu + other.maxcpu,
+            mem: self.mem + other.mem,
+            maxmem: self.maxmem + other.maxmem,
+        }
+    }
+}
+
+impl Sum for ResourceStats {
+    fn sum<I: Iterator<Item = Self>>(iter: I) -> Self {
+        iter.fold(Self::default(), |a, b| a + b)
+    }
+}
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 39ee44ce..0a27a25c 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -1,9 +1,15 @@
 use anyhow::Error;
 
-use crate::{
-    pve_static::{StaticNodeUsage, StaticServiceUsage},
-    topsis,
-};
+use crate::{node::NodeStats, resource::ResourceStats, topsis};
+
+/// The scheduler view of a node.
+#[derive(Clone, Debug)]
+pub struct NodeUsage {
+    /// The identifier of the node.
+    pub name: String,
+    /// The usage statistics of the node.
+    pub stats: NodeStats,
+}
 
 criteria_struct! {
     /// A given alternative.
@@ -22,69 +28,83 @@ criteria_struct! {
     static PVE_HA_TOPSIS_CRITERIA;
 }
 
-/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the resource would already be running on each.
-///
-/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
-/// is better.
-pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
-    nodes: &[T],
-    resource: &StaticServiceUsage,
-) -> Result<Vec<(String, f64)>, Error> {
-    let len = nodes.len();
-
-    let matrix = nodes
-        .iter()
-        .enumerate()
-        .map(|(target_index, _)| {
-            // Base values on percentages to allow comparing nodes with different stats.
-            let mut highest_cpu = 0.0;
-            let mut squares_cpu = 0.0;
-            let mut highest_mem = 0.0;
-            let mut squares_mem = 0.0;
-
-            for (index, node) in nodes.iter().enumerate() {
-                let node = node.as_ref();
-                let new_cpu = if index == target_index {
-                    if resource.maxcpu == 0.0 {
-                        node.cpu + node.maxcpu as f64
-                    } else {
-                        node.cpu + resource.maxcpu
-                    }
-                } else {
-                    node.cpu
-                } / (node.maxcpu as f64);
-                highest_cpu = f64::max(highest_cpu, new_cpu);
-                squares_cpu += new_cpu.powi(2);
-
-                let new_mem = if index == target_index {
-                    node.mem + resource.maxmem
-                } else {
-                    node.mem
-                } as f64
-                    / node.maxmem as f64;
-                highest_mem = f64::max(highest_mem, new_mem);
-                squares_mem += new_mem.powi(2);
-            }
-
-            // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-            // 1.004 is only slightly more than 1.002.
-            PveTopsisAlternative {
-                average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                highest_cpu: 1.0 + highest_cpu,
-                average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                highest_memory: 1.0 + highest_mem,
-            }
-            .into()
-        })
-        .collect::<Vec<_>>();
-
-    let scores =
-        topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
-    Ok(scores
-        .into_iter()
-        .enumerate()
-        .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
-        .collect())
+#[derive(Clone, Debug)]
+pub struct Scheduler {
+    nodes: Vec<NodeUsage>,
+}
+
+impl Scheduler {
+    /// Instantiate scheduler instance from node usages.
+    pub fn from_nodes<I>(nodes: I) -> Self
+    where
+        I: IntoIterator<Item: Into<NodeUsage>>,
+    {
+        Self {
+            nodes: nodes.into_iter().map(|node| node.into()).collect(),
+        }
+    }
+
+    /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
+    ///
+    /// The scoring is done as if the resource is already started on each node. This assumes that
+    /// the already started resource consumes the maximum amount of each stat according to its
+    /// `resource_stats`.
+    ///
+    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
+    /// score is better.
+    pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
+        &self,
+        resource_stats: T,
+    ) -> Result<Vec<(String, f64)>, Error> {
+        let len = self.nodes.len();
+        let resource_stats = resource_stats.into();
+
+        let matrix = self
+            .nodes
+            .iter()
+            .enumerate()
+            .map(|(target_index, _)| {
+                // Base values on percentages to allow comparing nodes with different stats.
+                let mut highest_cpu = 0.0;
+                let mut squares_cpu = 0.0;
+                let mut highest_mem = 0.0;
+                let mut squares_mem = 0.0;
+
+                for (index, node) in self.nodes.iter().enumerate() {
+                    let mut new_stats = node.stats;
+
+                    if index == target_index {
+                        new_stats.add_started_resource(&resource_stats)
+                    };
+
+                    let new_cpu = new_stats.cpu_load();
+                    highest_cpu = f64::max(highest_cpu, new_cpu);
+                    squares_cpu += new_cpu.powi(2);
+
+                    let new_mem = new_stats.mem_load();
+                    highest_mem = f64::max(highest_mem, new_mem);
+                    squares_mem += new_mem.powi(2);
+                }
+
+                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+                // 1.004 is only slightly more than 1.002.
+                PveTopsisAlternative {
+                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+                    highest_cpu: 1.0 + highest_cpu,
+                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+                    highest_memory: 1.0 + highest_mem,
+                }
+                .into()
+            })
+            .collect::<Vec<_>>();
+
+        let scores =
+            topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+        Ok(scores
+            .into_iter()
+            .enumerate()
+            .map(|(n, score)| (self.nodes[n].name.clone(), score))
+            .collect())
+    }
 }
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
new file mode 100644
index 00000000..376a0a4f
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -0,0 +1,200 @@
+use anyhow::Error;
+use proxmox_resource_scheduling::{
+    node::NodeStats,
+    resource::ResourceStats,
+    scheduler::{NodeUsage, Scheduler},
+};
+
+fn new_homogeneous_cluster_scheduler() -> Scheduler {
+    let (maxcpu, maxmem) = (16, 64 * (1 << 30));
+
+    let node1 = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu,
+            mem: 12334 << 20,
+            maxmem,
+        },
+    };
+
+    let node2 = NodeUsage {
+        name: String::from("node2"),
+        stats: NodeStats {
+            cpu: 15.184,
+            maxcpu,
+            mem: 529 << 20,
+            maxmem,
+        },
+    };
+
+    let node3 = NodeUsage {
+        name: String::from("node3"),
+        stats: NodeStats {
+            cpu: 5.2,
+            maxcpu,
+            mem: 9381 << 20,
+            maxmem,
+        },
+    };
+
+    Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn new_heterogeneous_cluster_scheduler() -> Scheduler {
+    let node1 = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu: 16,
+            mem: 12334 << 20,
+            maxmem: 128 << 30,
+        },
+    };
+
+    let node2 = NodeUsage {
+        name: String::from("node2"),
+        stats: NodeStats {
+            cpu: 15.184,
+            maxcpu: 32,
+            mem: 529 << 20,
+            maxmem: 96 << 30,
+        },
+    };
+
+    let node3 = NodeUsage {
+        name: String::from("node3"),
+        stats: NodeStats {
+            cpu: 5.2,
+            maxcpu: 24,
+            mem: 9381 << 20,
+            maxmem: 64 << 30,
+        },
+    };
+
+    Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn rank_nodes_to_start_resource(
+    scheduler: &Scheduler,
+    resource_stats: ResourceStats,
+) -> Result<Vec<String>, Error> {
+    let mut alternatives = scheduler.score_nodes_to_start_resource(resource_stats)?;
+
+    alternatives.sort_by(|a, b| b.1.total_cmp(&a.1));
+
+    Ok(alternatives
+        .iter()
+        .map(|alternative| alternative.0.to_string())
+        .collect())
+}
+
+#[test]
+fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    let heavy_memory_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 1.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+        vec!["node2", "node3", "node1"]
+    );
+
+    let heavy_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 0.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let combined_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, combined_resource_stats)?,
+        vec!["node2", "node3", "node1"]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    let heavy_memory_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 1.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+        vec!["node2", "node1", "node3"]
+    );
+
+    let heavy_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+        vec!["node3", "node2", "node1"]
+    );
+
+    let unlimited_cpu_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 0.0,
+        mem: 0,
+        maxmem: 0,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+        vec!["node1", "node3", "node2"]
+    );
+
+    let combined_resource_stats = ResourceStats {
+        cpu: 0.0,
+        maxcpu: 12.0,
+        mem: 0,
+        maxmem: 12 << 30,
+    };
+
+    assert_eq!(
+        rank_nodes_to_start_resource(&scheduler, combined_resource_stats)?,
+        vec!["node2", "node1", "node3"]
+    );
+
+    Ok(())
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (3 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  7:26   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH proxmox v3 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
                   ` (36 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

This is a more generic version of the `Usage` implementation from the
pve_static bindings in the pve_rs repository.

As the upcoming load balancing scheduler actions and dynamic resource
scheduler will need more information about each resource, this further
improves on the state tracking of each resource:

In this implementation, a resource is composed of its usage statistics
and its two essential states: the running state and the node placement.
The non_exhaustive attribute ensures that usages need to construct the
a Resource instance through its API.

Users can repeatedly use the current state of Usage to make scheduling
decisions with the to_scheduler() method. This method takes an
implementation of UsageAggregator, which dictates how the usage
information is represented to the Scheduler.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- inline bail! formatting variables
- s/to_string/to_owned/ where reasonable
- make Node::resources_iter(&self) return &str Iterator impl
- drop add_resource_to_nodes() and remove_resource_from_nodes()
- drop ResourcePlacement::nodenames() and Resource::nodenames()
- drop Resource::moving_to()
- fix behavior of add_resource_usage_to_node() for already added
  resources: if the next nodename is non-existing, the resource would
  still be put into moving but then not add the resource to the nodes;
  this is fixed now by improving the handling
- inline behavior of add_resource() to be more consise how both
  placement strategies are handled
- no change in Resource::remove_node() documentation as I did not find a
  better description in the meantime, but as it's internal it can be
  improved later on as well

test changes v2 -> v3:
- use assertions whether nodes were added correctly in test cases
- use assertions whether resource were added correctly in test cases
- additionally assert whether resource cannot be added to non-existing
  node with add_resource_usage_to_node() and does not alter state of the
  Resource for that resource in the mean time as it was in v2
- use assert!() instead of bail!() in test cases as much as appropriate

 proxmox-resource-scheduling/src/lib.rs      |   1 +
 proxmox-resource-scheduling/src/node.rs     |  40 ++++
 proxmox-resource-scheduling/src/resource.rs |  84 ++++++++
 proxmox-resource-scheduling/src/usage.rs    | 208 ++++++++++++++++++++
 proxmox-resource-scheduling/tests/usage.rs  | 181 +++++++++++++++++
 5 files changed, 514 insertions(+)
 create mode 100644 proxmox-resource-scheduling/src/usage.rs
 create mode 100644 proxmox-resource-scheduling/tests/usage.rs

diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 12b743fe..99ca16d8 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -3,6 +3,7 @@ pub mod topsis;
 
 pub mod node;
 pub mod resource;
+pub mod usage;
 
 pub mod scheduler;
 
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index e6227eda..304582ee 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -1,3 +1,5 @@
+use std::collections::HashSet;
+
 use crate::resource::ResourceStats;
 
 /// Usage statistics of a node.
@@ -37,3 +39,41 @@ impl NodeStats {
         self.mem as f64 / self.maxmem as f64
     }
 }
+
+/// A node in the cluster context.
+#[derive(Clone, Debug)]
+pub struct Node {
+    /// Base stats of the node.
+    stats: NodeStats,
+    /// The identifiers of the resources assigned to the node.
+    resources: HashSet<String>,
+}
+
+impl Node {
+    pub fn new(stats: NodeStats) -> Self {
+        Self {
+            stats,
+            resources: HashSet::new(),
+        }
+    }
+
+    pub fn add_resource(&mut self, sid: String) -> bool {
+        self.resources.insert(sid)
+    }
+
+    pub fn remove_resource(&mut self, sid: &str) -> bool {
+        self.resources.remove(sid)
+    }
+
+    pub fn stats(&self) -> NodeStats {
+        self.stats
+    }
+
+    pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
+        self.resources.iter().map(String::as_str)
+    }
+
+    pub fn contains_resource(&self, sid: &str) -> bool {
+        self.resources.contains(sid)
+    }
+}
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
index 1eb9d15e..2dbe6fa4 100644
--- a/proxmox-resource-scheduling/src/resource.rs
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -31,3 +31,87 @@ impl Sum for ResourceStats {
         iter.fold(Self::default(), |a, b| a + b)
     }
 }
+
+/// Execution state of a resource.
+#[derive(Copy, Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourceState {
+    /// The resource is stopped.
+    Stopped,
+    /// The resource is scheduled to start.
+    Starting,
+    /// The resource is started and currently running.
+    Started,
+}
+
+/// Placement of a resource.
+#[derive(Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourcePlacement {
+    /// The resource is on `current_node`.
+    Stationary { current_node: String },
+    /// The resource is being moved from `current_node` to `target_node`.
+    Moving {
+        current_node: String,
+        target_node: String,
+    },
+}
+
+/// A resource in the cluster context.
+#[derive(Clone, Debug)]
+#[non_exhaustive]
+pub struct Resource {
+    /// The usage statistics of the resource.
+    stats: ResourceStats,
+    /// The execution state of the resource.
+    state: ResourceState,
+    /// The placement of the resource.
+    placement: ResourcePlacement,
+}
+
+impl Resource {
+    pub fn new(stats: ResourceStats, state: ResourceState, placement: ResourcePlacement) -> Self {
+        Self {
+            stats,
+            state,
+            placement,
+        }
+    }
+
+    /// Handles the external removal of a node.
+    ///
+    /// Returns whether the resource does not have any node left.
+    pub fn remove_node(&mut self, nodename: &str) -> bool {
+        match &self.placement {
+            ResourcePlacement::Stationary { current_node } => current_node == nodename,
+            ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            } => {
+                if current_node == nodename {
+                    self.placement = ResourcePlacement::Stationary {
+                        current_node: target_node.to_owned(),
+                    };
+                } else if target_node == nodename {
+                    self.placement = ResourcePlacement::Stationary {
+                        current_node: current_node.to_owned(),
+                    };
+                }
+
+                false
+            }
+        }
+    }
+
+    pub fn state(&self) -> ResourceState {
+        self.state
+    }
+
+    pub fn stats(&self) -> ResourceStats {
+        self.stats
+    }
+
+    pub fn placement(&self) -> &ResourcePlacement {
+        &self.placement
+    }
+}
diff --git a/proxmox-resource-scheduling/src/usage.rs b/proxmox-resource-scheduling/src/usage.rs
new file mode 100644
index 00000000..81b88452
--- /dev/null
+++ b/proxmox-resource-scheduling/src/usage.rs
@@ -0,0 +1,208 @@
+use anyhow::{bail, Error};
+
+use std::collections::HashMap;
+
+use crate::{
+    node::{Node, NodeStats},
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    scheduler::{NodeUsage, Scheduler},
+};
+
+/// The state of the usage in the cluster.
+///
+/// The cluster usage represents the current state of the assignments between nodes and resources
+/// and their usage statistics. A resource can be placed on these nodes according to their
+/// placement state. See [`crate::resource::Resource`] for more information.
+///
+/// The cluster usage state can be used to build a current state for the [`Scheduler`].
+#[derive(Default)]
+pub struct Usage {
+    nodes: HashMap<String, Node>,
+    resources: HashMap<String, Resource>,
+}
+
+/// An aggregator for the [`Usage`] maps the cluster usage to node usage statistics that are
+/// relevant for the scheduler.
+pub trait UsageAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage>;
+}
+
+impl Usage {
+    /// Instantiate an empty cluster usage.
+    pub fn new() -> Self {
+        Self::default()
+    }
+
+    /// Add a node to the cluster usage.
+    ///
+    /// This method fails if a node with the same `nodename` already exists.
+    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
+        if self.nodes.contains_key(&nodename) {
+            bail!("node '{nodename}' already exists");
+        }
+
+        self.nodes.insert(nodename, Node::new(stats));
+
+        Ok(())
+    }
+
+    /// Remove a node from the cluster usage.
+    pub fn remove_node(&mut self, nodename: &str) {
+        if let Some(node) = self.nodes.remove(nodename) {
+            node.resources_iter().for_each(|sid| {
+                if let Some(resource) = self.resources.get_mut(sid)
+                    && resource.remove_node(nodename)
+                {
+                    self.resources.remove(sid);
+                }
+            });
+        }
+    }
+
+    /// Returns a reference to the [`Node`] with the identifier `nodename`.
+    pub fn get_node(&self, nodename: &str) -> Option<&Node> {
+        self.nodes.get(nodename)
+    }
+
+    /// Returns an iterator for the cluster usage's nodes.
+    pub fn nodes_iter(&self) -> impl Iterator<Item = (&String, &Node)> {
+        self.nodes.iter()
+    }
+
+    /// Returns an iterator for the cluster usage's nodes.
+    pub fn nodenames_iter(&self) -> impl Iterator<Item = &String> {
+        self.nodes.keys()
+    }
+
+    /// Returns whether the node with the identifier `nodename` is present in the cluster usage.
+    pub fn contains_node(&self, nodename: &str) -> bool {
+        self.nodes.contains_key(nodename)
+    }
+
+    /// Add `resource` with identifier `sid` to cluster usage.
+    ///
+    /// This method fails if a resource with the same `sid` already exists or the resource's nodes
+    /// do not exist in the cluster usage.
+    pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
+        if self.resources.contains_key(&sid) {
+            bail!("resource '{sid}' already exists");
+        }
+
+        match resource.placement() {
+            ResourcePlacement::Stationary { current_node } => {
+                match self.nodes.get_mut(current_node) {
+                    Some(current_node) => {
+                        current_node.add_resource(sid.to_owned());
+                    }
+                    _ => bail!("current node for resource '{sid}' does not exist"),
+                }
+            }
+            ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            } => {
+                if current_node == target_node {
+                    bail!("resource '{sid}' has the same current and target node");
+                }
+
+                match self.nodes.get_disjoint_mut([current_node, target_node]) {
+                    [Some(current_node), Some(target_node)] => {
+                        current_node.add_resource(sid.to_owned());
+                        target_node.add_resource(sid.to_owned());
+                    }
+                    _ => bail!("nodes for resource '{sid}' do not exist"),
+                }
+            }
+        }
+
+        self.resources.insert(sid, resource);
+
+        Ok(())
+    }
+
+    /// Add `stats` from resource with identifier `sid` to node `nodename` in cluster usage.
+    ///
+    /// For the first call, the resource is assumed to be started and stationary on the given node.
+    /// If there was no intermediate call to remove the resource, the second call will assume that
+    /// the given node is the target node and the resource is being moved there. The second call
+    /// will ignore the value of `stats`.
+    #[deprecated = "only for backwards compatibility, use add_resource(...) instead"]
+    pub fn add_resource_usage_to_node(
+        &mut self,
+        nodename: &str,
+        sid: &str,
+        stats: ResourceStats,
+    ) -> Result<(), Error> {
+        if let Some(resource) = self.resources.remove(sid) {
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    let placement = ResourcePlacement::Moving {
+                        current_node: current_node.to_owned(),
+                        target_node: nodename.to_owned(),
+                    };
+                    let new_resource = Resource::new(resource.stats(), resource.state(), placement);
+
+                    if let Err(err) = self.add_resource(sid.to_owned(), new_resource) {
+                        self.add_resource(sid.to_owned(), resource)?;
+
+                        bail!(err);
+                    }
+
+                    Ok(())
+                }
+                ResourcePlacement::Moving { target_node, .. } => {
+                    bail!("resource '{sid}' is already moving to target node '{target_node}'")
+                }
+            }
+        } else {
+            let placement = ResourcePlacement::Stationary {
+                current_node: nodename.to_owned(),
+            };
+            let resource = Resource::new(stats, ResourceState::Started, placement);
+
+            self.add_resource(sid.to_owned(), resource)
+        }
+    }
+
+    /// Remove resource with identifier `sid` from cluster usage.
+    pub fn remove_resource(&mut self, sid: &str) {
+        if let Some(resource) = self.resources.remove(sid) {
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    if let Some(current_node) = self.nodes.get_mut(current_node) {
+                        current_node.remove_resource(sid);
+                    }
+                }
+                ResourcePlacement::Moving {
+                    current_node,
+                    target_node,
+                } => {
+                    if let Some(current_node) = self.nodes.get_mut(current_node) {
+                        current_node.remove_resource(sid);
+                    }
+
+                    if let Some(target_node) = self.nodes.get_mut(target_node) {
+                        target_node.remove_resource(sid);
+                    }
+                }
+            }
+        }
+    }
+
+    /// Returns a reference to the [`Resource`] with the identifier `sid`.
+    pub fn get_resource(&self, sid: &str) -> Option<&Resource> {
+        self.resources.get(sid)
+    }
+
+    /// Returns an iterator for the cluster usage's resources.
+    pub fn resources_iter(&self) -> impl Iterator<Item = (&String, &Resource)> {
+        self.resources.iter()
+    }
+
+    /// Use the current cluster usage as a base for a scheduling action.
+    pub fn to_scheduler<F: UsageAggregator>(&self) -> Scheduler {
+        let node_usages = F::aggregate(self);
+
+        Scheduler::from_nodes(node_usages)
+    }
+}
diff --git a/proxmox-resource-scheduling/tests/usage.rs b/proxmox-resource-scheduling/tests/usage.rs
new file mode 100644
index 00000000..b6cb5a6e
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/usage.rs
@@ -0,0 +1,181 @@
+use proxmox_resource_scheduling::{
+    node::NodeStats,
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    usage::Usage,
+};
+
+#[test]
+fn test_no_duplicate_nodes() {
+    let mut usage = Usage::new();
+
+    assert!(usage
+        .add_node("node1".to_owned(), NodeStats::default())
+        .is_ok());
+
+    assert!(
+        usage
+            .add_node("node1".to_owned(), NodeStats::default())
+            .is_err(),
+        "cluster usage does allow duplicate node entries"
+    );
+}
+
+#[test]
+fn test_no_duplicate_resources() {
+    let mut usage = Usage::new();
+
+    assert!(usage
+        .add_node("node1".to_owned(), NodeStats::default())
+        .is_ok());
+
+    let placement = ResourcePlacement::Stationary {
+        current_node: "node1".to_owned(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    assert!(usage
+        .add_resource("vm:101".to_owned(), resource.clone())
+        .is_ok());
+
+    assert!(
+        usage.add_resource("vm:101".to_owned(), resource).is_err(),
+        "cluster usage does allow duplicate resource entries"
+    );
+}
+
+fn assert_add_node(usage: &mut Usage, nodename: &str) {
+    assert!(usage
+        .add_node(nodename.to_owned(), NodeStats::default())
+        .is_ok());
+
+    assert!(
+        usage.get_node(nodename).is_some(),
+        "node '{nodename}' was not added"
+    );
+}
+
+fn assert_add_resource(usage: &mut Usage, sid: &str, resource: Resource) {
+    assert!(usage.add_resource(sid.to_owned(), resource).is_ok());
+
+    assert!(
+        usage.get_resource(sid).is_some(),
+        "resource '{sid}' was not added"
+    );
+}
+
+#[test]
+#[allow(deprecated)]
+fn test_add_resource_usage_to_node() {
+    let mut usage = Usage::new();
+
+    assert_add_node(&mut usage, "node1");
+    assert_add_node(&mut usage, "node2");
+    assert_add_node(&mut usage, "node3");
+
+    assert!(usage
+        .add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())
+        .is_ok());
+
+    assert!(
+        usage
+            .add_resource_usage_to_node("node4", "vm:101", ResourceStats::default())
+            .is_err(),
+        "add_resource_usage_to_node() allows adding non-existent nodes"
+    );
+
+    assert!(usage
+        .add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())
+        .is_ok());
+
+    assert!(
+        usage
+            .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
+            .is_err(),
+        "add_resource_usage_to_node() allows adding resources to more than two nodes"
+    );
+}
+
+#[test]
+fn test_add_remove_stationary_resource() {
+    let mut usage = Usage::new();
+
+    let (sid, nodename) = ("vm:101", "node1");
+
+    assert_add_node(&mut usage, nodename);
+
+    let placement = ResourcePlacement::Stationary {
+        current_node: nodename.to_owned(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    assert_add_resource(&mut usage, sid, resource);
+
+    if let Some(node) = usage.get_node(nodename) {
+        assert!(
+            node.contains_resource(sid),
+            "resource '{sid}' was not added from node '{nodename}'"
+        );
+    }
+
+    usage.remove_resource(sid);
+
+    assert!(
+        usage.get_resource(sid).is_none(),
+        "resource '{sid}' was not removed"
+    );
+
+    if let Some(node) = usage.get_node(nodename) {
+        assert!(
+            !node.contains_resource(sid),
+            "resource '{sid}' was not removed from node '{nodename}'"
+        );
+    }
+}
+
+#[test]
+fn test_add_remove_moving_resource() {
+    let mut usage = Usage::new();
+
+    let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
+
+    assert_add_node(&mut usage, current_nodename);
+    assert_add_node(&mut usage, target_nodename);
+
+    let placement = ResourcePlacement::Moving {
+        current_node: current_nodename.to_owned(),
+        target_node: target_nodename.to_owned(),
+    };
+    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+    assert_add_resource(&mut usage, sid, resource);
+
+    if let Some(current_node) = usage.get_node(current_nodename) {
+        assert!(
+            current_node.contains_resource(sid),
+            "resource '{sid}' was not added to current node '{current_nodename}'"
+        );
+    }
+
+    if let Some(target_node) = usage.get_node(target_nodename) {
+        assert!(
+            target_node.contains_resource(sid),
+            "resource '{sid}' was not added to target node '{target_nodename}'"
+        );
+    }
+
+    usage.remove_resource(sid);
+
+    if let Some(current_node) = usage.get_node(current_nodename) {
+        assert!(
+            !current_node.contains_resource(sid),
+            "resource '{sid}' was not removed from current node '{current_nodename}'"
+        );
+    }
+
+    if let Some(target_node) = usage.get_node(target_nodename) {
+        assert!(
+            !target_node.contains_resource(sid),
+            "resource '{sid}' was not removed from target node '{target_nodename}'"
+        );
+    }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 06/40] resource-scheduling: topsis: handle empty criteria without panics
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (4 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Iterator::min_by(...) and Iterator::max_by(...) do only return `None` if
there are no entries in the `Matrix` column at all. This can only happen
if the `Matrix` doesn't have any row entries.

This will make any call to score_alternatives(...), the only current
user of IdealAlternatives::compute(...), panic if there are no given
alternatives. Therefore use reasonable default values.

This has not happened yet, because the only non-test caller of
score_alternatives(...) is score_nodes_to_start_resource(...), which
always has nodes present in production.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/topsis.rs | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/proxmox-resource-scheduling/src/topsis.rs b/proxmox-resource-scheduling/src/topsis.rs
index 6d078aa6..ed5a9bd1 100644
--- a/proxmox-resource-scheduling/src/topsis.rs
+++ b/proxmox-resource-scheduling/src/topsis.rs
@@ -145,8 +145,10 @@ impl<const N: usize> IdealAlternatives<N> {
             let min = fixed_criterion
                 .clone()
                 .min_by(|a, b| a.total_cmp(b))
-                .unwrap();
-            let max = fixed_criterion.max_by(|a, b| a.total_cmp(b)).unwrap();
+                .unwrap_or(f64::NEG_INFINITY);
+            let max = fixed_criterion
+                .max_by(|a, b| a.total_cmp(b))
+                .unwrap_or(f64::INFINITY);
 
             (best[n], worst[n]) = match criteria[n].maximize {
                 true => (max, min),
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (5 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Even though comparing by index is slightly faster here, comparing by the
nodename makes factoring this out for an upcoming patch possible.

This should increase runtime only marginally as this is roughly bound by
the 2 * node_count * maximum_hostname_length.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/scheduler.rs | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 0a27a25c..33ef4586 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -62,18 +62,17 @@ impl Scheduler {
         let matrix = self
             .nodes
             .iter()
-            .enumerate()
-            .map(|(target_index, _)| {
+            .map(|node| {
                 // Base values on percentages to allow comparing nodes with different stats.
                 let mut highest_cpu = 0.0;
                 let mut squares_cpu = 0.0;
                 let mut highest_mem = 0.0;
                 let mut squares_mem = 0.0;
 
-                for (index, node) in self.nodes.iter().enumerate() {
-                    let mut new_stats = node.stats;
+                for target_node in self.nodes.iter() {
+                    let mut new_stats = target_node.stats;
 
-                    if index == target_index {
+                    if node.name == target_node.name {
                         new_stats.add_started_resource(&resource_stats)
                     };
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 08/40] resource-scheduling: factor out topsis alternative mapping
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (6 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The same calculation will be needed for the scoring of migrations with
the TOPSIS method in the following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 proxmox-resource-scheduling/src/scheduler.rs | 68 ++++++++++++--------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 33ef4586..5aca549d 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -44,6 +44,44 @@ impl Scheduler {
         }
     }
 
+    /// Map the current node usages to a [`PveTopsisAlternative`].
+    ///
+    /// The [`PveTopsisAlternative`] is derived by calculating a modified version of the root mean
+    /// square (RMS) and maximum value of each stat in the node usages.
+    fn topsis_alternative_with(
+        &self,
+        map_node_stats: impl Fn(&NodeUsage) -> NodeStats,
+    ) -> PveTopsisAlternative {
+        let len = self.nodes.len();
+
+        // Base values on percentages to allow comparing nodes with different stats.
+        let mut highest_cpu = 0.0;
+        let mut squares_cpu = 0.0;
+        let mut highest_mem = 0.0;
+        let mut squares_mem = 0.0;
+
+        for node in self.nodes.iter() {
+            let new_stats = map_node_stats(node);
+
+            let new_cpu = new_stats.cpu_load();
+            highest_cpu = f64::max(highest_cpu, new_cpu);
+            squares_cpu += new_cpu.powi(2);
+
+            let new_mem = new_stats.mem_load();
+            highest_mem = f64::max(highest_mem, new_mem);
+            squares_mem += new_mem.powi(2);
+        }
+
+        // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+        // 1.004 is only slightly more than 1.002.
+        PveTopsisAlternative {
+            average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+            highest_cpu: 1.0 + highest_cpu,
+            average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+            highest_memory: 1.0 + highest_mem,
+        }
+    }
+
     /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// The scoring is done as if the resource is already started on each node. This assumes that
@@ -56,43 +94,21 @@ impl Scheduler {
         &self,
         resource_stats: T,
     ) -> Result<Vec<(String, f64)>, Error> {
-        let len = self.nodes.len();
         let resource_stats = resource_stats.into();
 
         let matrix = self
             .nodes
             .iter()
             .map(|node| {
-                // Base values on percentages to allow comparing nodes with different stats.
-                let mut highest_cpu = 0.0;
-                let mut squares_cpu = 0.0;
-                let mut highest_mem = 0.0;
-                let mut squares_mem = 0.0;
-
-                for target_node in self.nodes.iter() {
+                self.topsis_alternative_with(|target_node| {
                     let mut new_stats = target_node.stats;
 
                     if node.name == target_node.name {
                         new_stats.add_started_resource(&resource_stats)
-                    };
+                    }
 
-                    let new_cpu = new_stats.cpu_load();
-                    highest_cpu = f64::max(highest_cpu, new_cpu);
-                    squares_cpu += new_cpu.powi(2);
-
-                    let new_mem = new_stats.mem_load();
-                    highest_mem = f64::max(highest_mem, new_mem);
-                    squares_mem += new_mem.powi(2);
-                }
-
-                // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
-                // 1.004 is only slightly more than 1.002.
-                PveTopsisAlternative {
-                    average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
-                    highest_cpu: 1.0 + highest_cpu,
-                    average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
-                    highest_memory: 1.0 + highest_mem,
-                }
+                    new_stats
+                })
                 .into()
             })
             .collect::<Vec<_>>();
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (7 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  7:33   ` Dominik Rusovac
  2026-03-31 12:42   ` Michael Köppl
  2026-03-30 14:30 ` [PATCH perl-rs v3 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
                   ` (32 subsequent siblings)
  41 siblings, 2 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Assuming that a resource will hold the same dynamic resource usage on a
new node as on the previous node, score possible migrations, where:

- the cluster node imbalance is minimal (bruteforce), or
- the shifted root mean square and maximum resource usages of the cpu
  and memory is minimal across the cluster nodes (TOPSIS).

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- fix wording in ScoredMigration::new() documentation
- use f64::powi instead of f64::powf in ScoredMigration::new()
- adapt wording in MigrationCandidate `stats` member documentation
- only compare order of return value of
  score_best_balancing_migration_candidates{,_topsis}() in test cases
  instead of equal between the imbalance scores
- introduce rank_best_balancing_migration_candidates{,_topsis}() for the
  test cases for reduced code duplication
- use assert! instead of bail! wherever appropriate

 proxmox-resource-scheduling/src/node.rs       |  17 ++
 proxmox-resource-scheduling/src/scheduler.rs  | 282 ++++++++++++++++++
 .../tests/scheduler.rs                        | 181 ++++++++++-
 3 files changed, 479 insertions(+), 1 deletion(-)

diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index 304582ee..e6d4ff5b 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -29,6 +29,18 @@ impl NodeStats {
         self.mem += resource_stats.maxmem;
     }
 
+    /// Adds the resource stats to the node stats as if the resource is running on the node.
+    pub fn add_running_resource(&mut self, resource_stats: &ResourceStats) {
+        self.cpu += resource_stats.cpu;
+        self.mem += resource_stats.mem;
+    }
+
+    /// Removes the resource stats from the node stats as if the resource is not running on the node.
+    pub fn remove_running_resource(&mut self, resource_stats: &ResourceStats) {
+        self.cpu -= resource_stats.cpu;
+        self.mem = self.mem.saturating_sub(resource_stats.mem);
+    }
+
     /// Returns the current cpu usage as a percentage.
     pub fn cpu_load(&self) -> f64 {
         self.cpu / self.maxcpu as f64
@@ -38,6 +50,11 @@ impl NodeStats {
     pub fn mem_load(&self) -> f64 {
         self.mem as f64 / self.maxmem as f64
     }
+
+    /// Returns a combined node usage as a percentage.
+    pub fn load(&self) -> f64 {
+        (self.cpu_load() + self.mem_load()) / 2.0
+    }
 }
 
 /// A node in the cluster context.
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 5aca549d..49d16f9f 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -2,6 +2,12 @@ use anyhow::Error;
 
 use crate::{node::NodeStats, resource::ResourceStats, topsis};
 
+use serde::{Deserialize, Serialize};
+use std::{
+    cmp::{Ordering, Reverse},
+    collections::BinaryHeap,
+};
+
 /// The scheduler view of a node.
 #[derive(Clone, Debug)]
 pub struct NodeUsage {
@@ -11,6 +17,36 @@ pub struct NodeUsage {
     pub stats: NodeStats,
 }
 
+/// Returns the load imbalance among the nodes.
+///
+/// The load balance is measured as the statistical dispersion of the individual node loads.
+///
+/// The current implementation uses the dimensionless coefficient of variation, which expresses the
+/// standard deviation in relation to the average mean of the node loads.
+///
+/// The coefficient of variation is not robust, which is a desired property here, because outliers
+/// should be detected as much as possible.
+fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
+    let node_count = nodes.len();
+    let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
+
+    let load_sum = node_loads.iter().sum::<f64>();
+
+    // load_sum is guaranteed to be -0.0 for empty `nodes`
+    if load_sum == 0.0 {
+        0.0
+    } else {
+        let load_mean = load_sum / node_count as f64;
+
+        let squared_diff_sum = node_loads
+            .iter()
+            .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
+        let load_sd = (squared_diff_sum / node_count as f64).sqrt();
+
+        load_sd / load_mean
+    }
+}
+
 criteria_struct! {
     /// A given alternative.
     struct PveTopsisAlternative {
@@ -33,6 +69,83 @@ pub struct Scheduler {
     nodes: Vec<NodeUsage>,
 }
 
+/// A possible migration.
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct Migration {
+    /// The identifier of a leading resource.
+    pub sid: String,
+    /// The current node of the leading resource.
+    pub source_node: String,
+    /// The possible migration target node for the resource.
+    pub target_node: String,
+}
+
+/// A possible migration with a score.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct ScoredMigration {
+    /// The possible migration.
+    pub migration: Migration,
+    /// The expected node imbalance after the migration.
+    pub imbalance: f64,
+}
+
+impl Ord for ScoredMigration {
+    fn cmp(&self, other: &Self) -> Ordering {
+        self.imbalance
+            .total_cmp(&other.imbalance)
+            .then(self.migration.cmp(&other.migration))
+    }
+}
+
+impl PartialOrd for ScoredMigration {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl PartialEq for ScoredMigration {
+    fn eq(&self, other: &Self) -> bool {
+        self.cmp(other) == Ordering::Equal
+    }
+}
+
+impl Eq for ScoredMigration {}
+
+impl ScoredMigration {
+    pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
+        // Depending on how the imbalance is calculated, it can contain minor approximation errors.
+        // As this struct implements the Ord trait, users of the struct's cmp() can run into cases,
+        // where the imbalance is the same up to the significant digits in base 10, but treated as
+        // different values.
+        //
+        // Therefore, truncate any non-significant digits to prevent these cases.
+        let factor = 10_f64.powi(f64::DIGITS as i32);
+        let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
+
+        Self {
+            migration: migration.into(),
+            imbalance: truncated_imbalance,
+        }
+    }
+}
+
+/// A possible migration candidate with the migrated usage stats.
+#[derive(Clone, Debug)]
+pub struct MigrationCandidate {
+    /// The possible migration.
+    pub migration: Migration,
+    /// Usage stats of the resource(s) to be migrated.
+    pub stats: ResourceStats,
+}
+
+impl From<MigrationCandidate> for Migration {
+    fn from(candidate: MigrationCandidate) -> Self {
+        candidate.migration
+    }
+}
+
 impl Scheduler {
     /// Instantiate scheduler instance from node usages.
     pub fn from_nodes<I>(nodes: I) -> Self
@@ -82,6 +195,123 @@ impl Scheduler {
         }
     }
 
+    /// Returns the load imbalance among the nodes.
+    ///
+    /// See [`calculate_node_imbalance`] for more information.
+    pub fn node_imbalance(&self) -> f64 {
+        calculate_node_imbalance(&self.nodes, |node| node.stats.load())
+    }
+
+    /// Returns the load imbalance among the nodes as if a specific resource was moved.
+    ///
+    /// See [`calculate_node_imbalance`] for more information.
+    fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64 {
+        calculate_node_imbalance(&self.nodes, |node| {
+            let mut new_stats = node.stats;
+
+            if node.name == candidate.migration.source_node {
+                new_stats.remove_running_resource(&candidate.stats);
+            } else if node.name == candidate.migration.target_node {
+                new_stats.add_running_resource(&candidate.stats);
+            }
+
+            new_stats.load()
+        })
+    }
+
+    /// Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+    /// done whether the given nodenames actually exist in the scheduler.
+    ///
+    /// The scoring is done as if each resource migration has already been done. This assumes that
+    /// the already migrated resource consumes the same amount of each stat as on the previous node
+    /// according to its `stats`.
+    ///
+    /// Returns up to `limit` of the best scored migrations.
+    pub fn score_best_balancing_migration_candidates<I>(
+        &self,
+        candidates: I,
+        limit: usize,
+    ) -> Vec<ScoredMigration>
+    where
+        I: IntoIterator<Item = MigrationCandidate>,
+    {
+        let mut scored_migrations = candidates
+            .into_iter()
+            .map(|candidate| {
+                let imbalance = self.node_imbalance_with_migration_candidate(&candidate);
+
+                Reverse(ScoredMigration::new(candidate, imbalance))
+            })
+            .collect::<BinaryHeap<_>>();
+
+        let mut best_migrations = Vec::with_capacity(limit);
+
+        // BinaryHeap::into_iter_sorted() is still in nightly unfortunately
+        while best_migrations.len() < limit {
+            match scored_migrations.pop() {
+                Some(Reverse(alternative)) => best_migrations.push(alternative),
+                None => break,
+            }
+        }
+
+        best_migrations
+    }
+
+    /// Scores the given migration `candidates` by the best node imbalance improvement with the
+    /// TOPSIS method.
+    ///
+    /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+    /// done whether the given nodenames actually exist in the scheduler.
+    ///
+    /// The scoring is done as if each resource migration has already been done. This assumes that
+    /// the already migrated resource consumes the same amount of each stat as on the previous node
+    /// according to its `stats`.
+    ///
+    /// Returns up to `limit` of the best scored migrations.
+    pub fn score_best_balancing_migration_candidates_topsis(
+        &self,
+        candidates: &[MigrationCandidate],
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let matrix = candidates
+            .iter()
+            .map(|candidate| {
+                let resource_stats = &candidate.stats;
+                let source_node = &candidate.migration.source_node;
+                let target_node = &candidate.migration.target_node;
+
+                self.topsis_alternative_with(|node| {
+                    let mut new_stats = node.stats;
+
+                    if &node.name == source_node {
+                        new_stats.remove_running_resource(resource_stats);
+                    } else if &node.name == target_node {
+                        new_stats.add_running_resource(resource_stats);
+                    }
+
+                    new_stats
+                })
+                .into()
+            })
+            .collect::<Vec<_>>();
+
+        let best_alternatives =
+            topsis::rank_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+        Ok(best_alternatives
+            .into_iter()
+            .take(limit)
+            .map(|i| {
+                let imbalance = self.node_imbalance_with_migration_candidate(&candidates[i]);
+
+                ScoredMigration::new(candidates[i].clone(), imbalance)
+            })
+            .collect())
+    }
+
     /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// The scoring is done as if the resource is already started on each node. This assumes that
@@ -123,3 +353,55 @@ impl Scheduler {
             .collect())
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_scored_migration_order() {
+        let migration1 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:102"),
+                source_node: String::from("node1"),
+                target_node: String::from("node2"),
+            },
+            0.7231749488916931,
+        );
+        let migration2 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:102"),
+                source_node: String::from("node1"),
+                target_node: String::from("node3"),
+            },
+            0.723174948891693,
+        );
+        let migration3 = ScoredMigration::new(
+            Migration {
+                sid: String::from("vm:101"),
+                source_node: String::from("node1"),
+                target_node: String::from("node2"),
+            },
+            0.723174948891693 + 1e-15,
+        );
+
+        let mut migrations = vec![migration2.clone(), migration3.clone(), migration1.clone()];
+
+        migrations.sort();
+
+        assert_eq!(
+            vec![migration1.clone(), migration2.clone(), migration3.clone()],
+            migrations
+        );
+
+        let mut heap = BinaryHeap::from(vec![
+            Reverse(migration2.clone()),
+            Reverse(migration3.clone()),
+            Reverse(migration1.clone()),
+        ]);
+
+        assert_eq!(heap.pop(), Some(Reverse(migration1)));
+        assert_eq!(heap.pop(), Some(Reverse(migration2)));
+        assert_eq!(heap.pop(), Some(Reverse(migration3)));
+    }
+}
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
index 376a0a4f..be90e4f9 100644
--- a/proxmox-resource-scheduling/tests/scheduler.rs
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -2,9 +2,13 @@ use anyhow::Error;
 use proxmox_resource_scheduling::{
     node::NodeStats,
     resource::ResourceStats,
-    scheduler::{NodeUsage, Scheduler},
+    scheduler::{Migration, MigrationCandidate, NodeUsage, Scheduler},
 };
 
+fn new_empty_cluster_scheduler() -> Scheduler {
+    Scheduler::from_nodes(Vec::<NodeUsage>::new())
+}
+
 fn new_homogeneous_cluster_scheduler() -> Scheduler {
     let (maxcpu, maxmem) = (16, 64 * (1 << 30));
 
@@ -75,6 +79,181 @@ fn new_heterogeneous_cluster_scheduler() -> Scheduler {
     Scheduler::from_nodes(vec![node1, node2, node3])
 }
 
+#[test]
+fn test_node_imbalance_with_empty_cluster() {
+    let scheduler = new_empty_cluster_scheduler();
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+#[test]
+fn test_node_imbalance_with_perfectly_balanced_cluster() {
+    let node = NodeUsage {
+        name: String::from("node1"),
+        stats: NodeStats {
+            cpu: 1.7,
+            maxcpu: 16,
+            mem: 224395264,
+            maxmem: 68719476736,
+        },
+    };
+
+    let scheduler = Scheduler::from_nodes(vec![node.clone()]);
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+
+    let scheduler = Scheduler::from_nodes(vec![node.clone(), node.clone(), node]);
+
+    assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+fn new_simple_migration_candidates() -> (Vec<MigrationCandidate>, Migration, Migration) {
+    let migration1 = Migration {
+        sid: String::from("vm:101"),
+        source_node: String::from("node1"),
+        target_node: String::from("node2"),
+    };
+    let migration2 = Migration {
+        sid: String::from("vm:101"),
+        source_node: String::from("node1"),
+        target_node: String::from("node3"),
+    };
+    let stats = ResourceStats {
+        cpu: 0.7,
+        maxcpu: 4.0,
+        mem: 8 << 30,
+        maxmem: 16 << 30,
+    };
+
+    let candidates = vec![
+        MigrationCandidate {
+            migration: migration1.clone(),
+            stats,
+        },
+        MigrationCandidate {
+            migration: migration2.clone(),
+            stats,
+        },
+    ];
+
+    (candidates, migration1, migration2)
+}
+
+fn assert_imbalance(imbalance: f64, expected_imbalance: f64) {
+    assert!(
+        (expected_imbalance - imbalance).abs() <= f64::EPSILON,
+        "imbalance is {imbalance}, but was expected to be {expected_imbalance}"
+    );
+}
+
+fn rank_best_balancing_migration_candidates(
+    scheduler: &Scheduler,
+    candidates: Vec<MigrationCandidate>,
+    limit: usize,
+) -> Vec<Migration> {
+    scheduler
+        .score_best_balancing_migration_candidates(candidates, limit)
+        .into_iter()
+        .map(|entry| entry.migration)
+        .collect()
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_with_no_candidates() {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates(&scheduler, vec![], 2),
+        vec![]
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_homogeneous_cluster() {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates(&scheduler, candidates, 2),
+        vec![migration2, migration1]
+    );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_heterogeneous_cluster() {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates(&scheduler, candidates, 2),
+        vec![migration2, migration1]
+    );
+}
+
+fn rank_best_balancing_migration_candidates_topsis(
+    scheduler: &Scheduler,
+    candidates: &[MigrationCandidate],
+    limit: usize,
+) -> Result<Vec<Migration>, Error> {
+    Ok(scheduler
+        .score_best_balancing_migration_candidates_topsis(candidates, limit)?
+        .into_iter()
+        .map(|entry| entry.migration)
+        .collect())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_with_no_candidates() -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates_topsis(&scheduler, &[], 2)?,
+        vec![]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_homogeneous_cluster(
+) -> Result<(), Error> {
+    let scheduler = new_homogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates_topsis(&scheduler, &candidates, 2)?,
+        vec![migration1, migration2]
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_heterogeneous_cluster(
+) -> Result<(), Error> {
+    let scheduler = new_heterogeneous_cluster_scheduler();
+
+    assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+    let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+    assert_eq!(
+        rank_best_balancing_migration_candidates_topsis(&scheduler, &candidates, 2)?,
+        vec![migration1, migration2]
+    );
+
+    Ok(())
+}
+
 fn rank_nodes_to_start_resource(
     scheduler: &Scheduler,
     resource_stats: ResourceStats,
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (8 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH perl-rs v3 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
service_nodes entries from being cleaned up correctly.

While technically a API break, removing the error does not change any
callers, which do not handle the error anyway. Additionally,
remove_node(...) is only used in testing code in this package and
pve-ha-manager, but is currently unused for production code.

This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 5b91d36..6e57b9d 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -75,25 +75,16 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Remove a node from the scheduler.
     #[export]
-    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> Result<(), Error> {
+    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
         let mut usage = this.inner.lock().unwrap();
 
         if let Some(node) = usage.nodes.remove(nodename) {
             for (sid, _) in node.services.iter() {
-                match usage.service_nodes.get_mut(sid) {
-                    Some(service_nodes) => {
-                        service_nodes.remove(nodename);
-                    }
-                    None => bail!(
-                        "service '{}' not present in service_nodes hashmap while removing node '{}'",
-                        sid,
-                        nodename
-                    ),
+                if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
+                    service_nodes.remove(nodename);
                 }
             }
         }
-
-        Ok(())
     }
 
     /// Method: Get a list of all the nodes in the scheduler.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (9 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH perl-rs v3 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
node services entries from being cleaned up correctly.

While technically a API break, removing the error does not change any
callers.

This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 6e57b9d..b8eac57 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -145,25 +145,16 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
     #[export]
-    fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) -> Result<(), Error> {
+    fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
         let mut usage = this.inner.lock().unwrap();
 
         if let Some(nodes) = usage.service_nodes.remove(sid) {
             for nodename in &nodes {
-                match usage.nodes.get_mut(nodename) {
-                    Some(node) => {
-                        node.services.remove(sid);
-                    }
-                    None => bail!(
-                        "service '{}' not present in usage hashmap on node '{}'",
-                        sid,
-                        nodename
-                    ),
+                if let Some(node) = usage.nodes.get_mut(nodename) {
+                    node.services.remove(sid);
                 }
             }
         }
-
-        Ok(())
     }
 
     /// Scores all previously added nodes for starting a `service` on.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (10 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

This is in preparation to add the upcoming pve_dynamic bindings, which
shares much of the same code paths as the pve_static implementation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 pve-rs/src/bindings/mod.rs                                    | 3 +--
 pve-rs/src/bindings/resource_scheduling/mod.rs                | 4 ++++
 .../pve_static.rs}                                            | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/mod.rs
 rename pve-rs/src/bindings/{resource_scheduling_static.rs => resource_scheduling/pve_static.rs} (98%)

diff --git a/pve-rs/src/bindings/mod.rs b/pve-rs/src/bindings/mod.rs
index c21b328..853a3dd 100644
--- a/pve-rs/src/bindings/mod.rs
+++ b/pve-rs/src/bindings/mod.rs
@@ -3,8 +3,7 @@
 mod oci;
 pub use oci::pve_rs_oci;
 
-mod resource_scheduling_static;
-pub use resource_scheduling_static::pve_rs_resource_scheduling_static;
+pub mod resource_scheduling;
 
 mod tfa;
 pub use tfa::pve_rs_tfa;
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
new file mode 100644
index 0000000..af1fb6b
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -0,0 +1,4 @@
+//! Resource scheduling related bindings.
+
+mod pve_static;
+pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
similarity index 98%
rename from pve-rs/src/bindings/resource_scheduling_static.rs
rename to pve-rs/src/bindings/resource_scheduling/pve_static.rs
index b8eac57..a83a9ab 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -2,7 +2,7 @@
 pub mod pve_rs_resource_scheduling_static {
     //! The `PVE::RS::ResourceScheduling::Static` package.
     //!
-    //! Provides bindings for the resource scheduling module.
+    //! Provides bindings for the static resource scheduling module.
     //!
     //! See [`proxmox_resource_scheduling`].
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (11 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  7:40   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH perl-rs v3 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
                   ` (28 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The proxmox_resource_scheduling crate provides a generic usage
implementation, which is backwards compatible with the pve_static
bindings. This reduces the static resource scheduling bindings to a
slightly thinner wrapper.

This also exposes the new `add_resource(...)` binding, which allows
callers to add services with additional state other than the usage
stats. It is exposed as `add_service(...)` to be consistent with the
naming of the rest of the existing methods.

Where it is sensible for the bindings, the documentation is extended
with a link to the documentation of the underlying methods.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- require callers to handle the `current_node` not set invariant
  themselves, as this is pve-ha-manager-specific behavior and simplifies
  the logic a bit
- s/FromInto/From/ for PveResource<T> impl
- use kebab-case for (de)serialization of `PveResource<T>`
- make node_stats closure variable mutable instead of shadowing it in
  the closure body again in StartedResourceAggregator::aggregate()

 .../src/bindings/resource_scheduling/mod.rs   |   3 +
 .../resource_scheduling/pve_static.rs         | 154 ++++++------------
 .../bindings/resource_scheduling/resource.rs  |  41 +++++
 .../src/bindings/resource_scheduling/usage.rs |  33 ++++
 4 files changed, 130 insertions(+), 101 deletions(-)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/resource.rs
 create mode 100644 pve-rs/src/bindings/resource_scheduling/usage.rs

diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index af1fb6b..9ce631c 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -1,4 +1,7 @@
 //! Resource scheduling related bindings.
 
+mod resource;
+mod usage;
+
 mod pve_static;
 pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index a83a9ab..5353db9 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -6,40 +6,34 @@ pub mod pve_rs_resource_scheduling_static {
     //!
     //! See [`proxmox_resource_scheduling`].
 
-    use std::collections::{HashMap, HashSet};
     use std::sync::Mutex;
 
-    use anyhow::{Error, bail};
+    use anyhow::Error;
 
     use perlmod::Value;
-    use proxmox_resource_scheduling::pve_static::{StaticNodeUsage, StaticServiceUsage};
+    use proxmox_resource_scheduling::node::NodeStats;
+    use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+    use proxmox_resource_scheduling::usage::Usage;
+
+    use crate::bindings::resource_scheduling::{
+        resource::PveResource, usage::StartedResourceAggregator,
+    };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
 
-    struct StaticNodeInfo {
-        name: String,
-        maxcpu: usize,
-        maxmem: usize,
-        services: HashMap<String, StaticServiceUsage>,
-    }
-
-    struct Usage {
-        nodes: HashMap<String, StaticNodeInfo>,
-        service_nodes: HashMap<String, HashSet<String>>,
-    }
-
-    /// A scheduler instance contains the resource usage by node.
+    /// A scheduler instance contains the cluster usage.
     pub struct Scheduler {
         inner: Mutex<Usage>,
     }
 
+    type StaticResource = PveResource<StaticServiceUsage>;
+
     /// Class method: Create a new [`Scheduler`] instance.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::new`].
     #[export(raw_return)]
     pub fn new(#[raw] class: Value) -> Result<Value, Error> {
-        let inner = Usage {
-            nodes: HashMap::new(),
-            service_nodes: HashMap::new(),
-        };
+        let inner = Usage::new();
 
         Ok(perlmod::instantiate_magic!(
             &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
@@ -48,7 +42,7 @@ pub mod pve_rs_resource_scheduling_static {
 
     /// Method: Add a node with its basic CPU and memory info.
     ///
-    /// This inserts a [`StaticNodeInfo`] entry for the node into the scheduler instance.
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
     #[export]
     pub fn add_node(
         #[try_from_ref] this: &Scheduler,
@@ -58,33 +52,24 @@ pub mod pve_rs_resource_scheduling_static {
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
-        if usage.nodes.contains_key(&nodename) {
-            bail!("node {} already added", nodename);
-        }
-
-        let node = StaticNodeInfo {
-            name: nodename.clone(),
+        let stats = NodeStats {
+            cpu: 0.0,
             maxcpu,
+            mem: 0,
             maxmem,
-            services: HashMap::new(),
         };
 
-        usage.nodes.insert(nodename, node);
-        Ok(())
+        usage.add_node(nodename, stats)
     }
 
     /// Method: Remove a node from the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
     #[export]
     pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
         let mut usage = this.inner.lock().unwrap();
 
-        if let Some(node) = usage.nodes.remove(nodename) {
-            for (sid, _) in node.services.iter() {
-                if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
-                    service_nodes.remove(nodename);
-                }
-            }
-        }
+        usage.remove_node(nodename);
     }
 
     /// Method: Get a list of all the nodes in the scheduler.
@@ -93,9 +78,8 @@ pub mod pve_rs_resource_scheduling_static {
         let usage = this.inner.lock().unwrap();
 
         usage
-            .nodes
-            .keys()
-            .map(|nodename| nodename.to_string())
+            .nodenames_iter()
+            .map(|nodename| nodename.to_owned())
             .collect()
     }
 
@@ -104,10 +88,26 @@ pub mod pve_rs_resource_scheduling_static {
     pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
         let usage = this.inner.lock().unwrap();
 
-        usage.nodes.contains_key(nodename)
+        usage.contains_node(nodename)
+    }
+
+    /// Method: Add `service` with identifier `sid` to the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+    #[export]
+    pub fn add_service(
+        #[try_from_ref] this: &Scheduler,
+        sid: String,
+        service: StaticResource,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_resource(sid, service.into())
     }
 
     /// Method: Add service `sid` and its `service_usage` to the node.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource_usage_to_node`].
     #[export]
     pub fn add_service_usage_to_node(
         #[try_from_ref] this: &Scheduler,
@@ -117,81 +117,33 @@ pub mod pve_rs_resource_scheduling_static {
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
-        match usage.nodes.get_mut(nodename) {
-            Some(node) => {
-                if node.services.contains_key(sid) {
-                    bail!("service '{}' already added to node '{}'", sid, nodename);
-                }
-
-                node.services.insert(sid.to_string(), service_usage);
-            }
-            None => bail!("node '{}' not present in usage hashmap", nodename),
-        }
-
-        if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
-            if service_nodes.contains(nodename) {
-                bail!("node '{}' already added to service '{}'", nodename, sid);
-            }
-
-            service_nodes.insert(nodename.to_string());
-        } else {
-            let mut service_nodes = HashSet::new();
-            service_nodes.insert(nodename.to_string());
-            usage.service_nodes.insert(sid.to_string(), service_nodes);
-        }
-
-        Ok(())
+        // TODO Only for backwards compatibility, can be removed with a proper version bump
+        #[allow(deprecated)]
+        usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
     }
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
     #[export]
     fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
         let mut usage = this.inner.lock().unwrap();
 
-        if let Some(nodes) = usage.service_nodes.remove(sid) {
-            for nodename in &nodes {
-                if let Some(node) = usage.nodes.get_mut(nodename) {
-                    node.services.remove(sid);
-                }
-            }
-        }
+        usage.remove_resource(sid);
     }
 
-    /// Scores all previously added nodes for starting a `service` on.
+    /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
     ///
-    /// Scoring is done according to the static memory and CPU usages of the nodes as if the
-    /// service would already be running on each.
-    ///
-    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
-    /// score is better.
-    ///
-    /// See [`proxmox_resource_scheduling::pve_static::score_nodes_to_start_service`].
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
     #[export]
     pub fn score_nodes_to_start_service(
         #[try_from_ref] this: &Scheduler,
-        service: StaticServiceUsage,
+        service_stats: StaticServiceUsage,
     ) -> Result<Vec<(String, f64)>, Error> {
         let usage = this.inner.lock().unwrap();
-        let nodes = usage
-            .nodes
-            .values()
-            .map(|node| {
-                let mut node_usage = StaticNodeUsage {
-                    name: node.name.to_string(),
-                    cpu: 0.0,
-                    maxcpu: node.maxcpu,
-                    mem: 0,
-                    maxmem: node.maxmem,
-                };
 
-                for service in node.services.values() {
-                    node_usage.add_service_usage(service);
-                }
-
-                node_usage
-            })
-            .collect::<Vec<StaticNodeUsage>>();
-
-        proxmox_resource_scheduling::pve_static::score_nodes_to_start_service(&nodes, &service)
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_nodes_to_start_resource(service_stats)
     }
 }
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
new file mode 100644
index 0000000..532e868
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -0,0 +1,41 @@
+use proxmox_resource_scheduling::resource::{
+    Resource, ResourcePlacement, ResourceState, ResourceStats,
+};
+
+use serde::{Deserialize, Serialize};
+
+/// A PVE resource.
+#[derive(Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct PveResource<T: Into<ResourceStats>> {
+    /// The resource's usage statistics.
+    stats: T,
+    /// Whether the resource is running.
+    running: bool,
+    /// The resource's current node.
+    current_node: String,
+    /// The resource's optional migration target node.
+    target_node: Option<String>,
+}
+
+impl<T: Into<ResourceStats>> From<PveResource<T>> for Resource {
+    fn from(resource: PveResource<T>) -> Self {
+        let state = if resource.running {
+            ResourceState::Started
+        } else {
+            ResourceState::Starting
+        };
+
+        let current_node = resource.current_node;
+        let placement = if let Some(target_node) = resource.target_node {
+            ResourcePlacement::Moving {
+                current_node,
+                target_node,
+            }
+        } else {
+            ResourcePlacement::Stationary { current_node }
+        };
+
+        Resource::new(resource.stats.into(), state, placement)
+    }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
new file mode 100644
index 0000000..17a8d4d
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -0,0 +1,33 @@
+use proxmox_resource_scheduling::{
+    scheduler::NodeUsage,
+    usage::{Usage, UsageAggregator},
+};
+
+/// An aggregator, which adds any resource as a started resource.
+///
+/// This aggregator is useful if the node base stats do not have any current usage.
+pub(crate) struct StartedResourceAggregator;
+
+impl UsageAggregator for StartedResourceAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| {
+                let stats = node
+                    .resources_iter()
+                    .fold(node.stats(), |mut node_stats, sid| {
+                        if let Some(resource) = usage.get_resource(sid) {
+                            node_stats.add_started_resource(&resource.stats());
+                        }
+
+                        node_stats
+                    });
+
+                NodeUsage {
+                    name: nodename.to_owned(),
+                    stats,
+                }
+            })
+            .collect()
+    }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (12 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH perl-rs v3 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The StaticServiceUsage is marked as deprecated in
proxmox-resource-scheduling now to make the crate independent of the
specific usage structs and the deserialization of these.

Therefore, define the same struct in the pve_static bindings module.

Though this is technically a Rust API break, the Perl bindings do not
have the concept of structs, which are serialized as Perl hashes.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 .../resource_scheduling/pve_static.rs         | 32 ++++++++++++++++---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index 5353db9..678fccb 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -9,10 +9,11 @@ pub mod pve_rs_resource_scheduling_static {
     use std::sync::Mutex;
 
     use anyhow::Error;
+    use serde::{Deserialize, Serialize};
 
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
-    use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+    use proxmox_resource_scheduling::resource::ResourceStats;
     use proxmox_resource_scheduling::usage::Usage;
 
     use crate::bindings::resource_scheduling::{
@@ -26,7 +27,28 @@ pub mod pve_rs_resource_scheduling_static {
         inner: Mutex<Usage>,
     }
 
-    type StaticResource = PveResource<StaticServiceUsage>;
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Static usage stats of a resource.
+    pub struct StaticResourceStats {
+        /// Number of assigned CPUs or CPU limit.
+        pub maxcpu: f64,
+        /// Maximum assigned memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<StaticResourceStats> for ResourceStats {
+        fn from(stats: StaticResourceStats) -> Self {
+            Self {
+                cpu: stats.maxcpu,
+                maxcpu: stats.maxcpu,
+                mem: stats.maxmem,
+                maxmem: stats.maxmem,
+            }
+        }
+    }
+
+    type StaticResource = PveResource<StaticResourceStats>;
 
     /// Class method: Create a new [`Scheduler`] instance.
     ///
@@ -113,13 +135,13 @@ pub mod pve_rs_resource_scheduling_static {
         #[try_from_ref] this: &Scheduler,
         nodename: &str,
         sid: &str,
-        service_usage: StaticServiceUsage,
+        service_stats: StaticResourceStats,
     ) -> Result<(), Error> {
         let mut usage = this.inner.lock().unwrap();
 
         // TODO Only for backwards compatibility, can be removed with a proper version bump
         #[allow(deprecated)]
-        usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
+        usage.add_resource_usage_to_node(nodename, sid, service_stats.into())
     }
 
     /// Method: Remove service `sid` and its usage from all assigned nodes.
@@ -138,7 +160,7 @@ pub mod pve_rs_resource_scheduling_static {
     #[export]
     pub fn score_nodes_to_start_service(
         #[try_from_ref] this: &Scheduler,
-        service_stats: StaticServiceUsage,
+        service_stats: StaticResourceStats,
     ) -> Result<Vec<(String, f64)>, Error> {
         let usage = this.inner.lock().unwrap();
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (13 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH perl-rs v3 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The implementation is similar to pve_static, but extends the node and
resource stats with sampled runtime usage statistics, i.e., the actual
usage on the nodes and the actual usages of the resources.

In the case of users repeatedly calling score_nodes_to_start_resource()
and then adding them as starting resources with add_resource(), these
starting resources need to be accumulated on top of these nodes actual
current usages to prevent score_nodes_to_start_resource() to favor the
currently least loaded node(s) for all starting resources.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- make node_stats closure variable mutable instead of shadowing it in
  the closure body again in StartedResourceAggregator::aggregate()
  (as suggested by @Dominik)

 pve-rs/Makefile                               |   1 +
 .../src/bindings/resource_scheduling/mod.rs   |   3 +
 .../resource_scheduling/pve_dynamic.rs        | 174 ++++++++++++++++++
 .../src/bindings/resource_scheduling/usage.rs |  33 ++++
 pve-rs/test/resource_scheduling.pl            |   1 +
 5 files changed, 212 insertions(+)
 create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs

diff --git a/pve-rs/Makefile b/pve-rs/Makefile
index 3bbc464..c2f9b73 100644
--- a/pve-rs/Makefile
+++ b/pve-rs/Makefile
@@ -30,6 +30,7 @@ PERLMOD_PACKAGES := \
 	  PVE::RS::OCI \
 	  PVE::RS::OpenId \
 	  PVE::RS::ResourceScheduling::Static \
+	  PVE::RS::ResourceScheduling::Dynamic \
 	  PVE::RS::SDN::Fabrics \
 	  PVE::RS::SDN \
 	  PVE::RS::TFA
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index 9ce631c..87b4a03 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -5,3 +5,6 @@ mod usage;
 
 mod pve_static;
 pub use pve_static::pve_rs_resource_scheduling_static;
+
+mod pve_dynamic;
+pub use pve_dynamic::pve_rs_resource_scheduling_dynamic;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
new file mode 100644
index 0000000..27ccf39
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -0,0 +1,174 @@
+#[perlmod::package(name = "PVE::RS::ResourceScheduling::Dynamic", lib = "pve_rs")]
+pub mod pve_rs_resource_scheduling_dynamic {
+    //! The `PVE::RS::ResourceScheduling::Dynamic` package.
+    //!
+    //! Provides bindings for the dynamic resource scheduling module.
+    //!
+    //! See [`proxmox_resource_scheduling`].
+
+    use std::sync::Mutex;
+
+    use anyhow::Error;
+    use serde::{Deserialize, Serialize};
+
+    use perlmod::Value;
+    use proxmox_resource_scheduling::node::NodeStats;
+    use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::usage::Usage;
+
+    use crate::bindings::resource_scheduling::resource::PveResource;
+    use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+
+    perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
+
+    /// A scheduler instance contains the cluster usage.
+    pub struct Scheduler {
+        inner: Mutex<Usage>,
+    }
+
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Dynamic usage stats of a node.
+    pub struct DynamicNodeStats {
+        /// CPU utilization in CPU cores.
+        pub cpu: f64,
+        /// Total number of CPU cores.
+        pub maxcpu: usize,
+        /// Used memory in bytes.
+        pub mem: usize,
+        /// Total memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<DynamicNodeStats> for NodeStats {
+        fn from(value: DynamicNodeStats) -> Self {
+            Self {
+                cpu: value.cpu,
+                maxcpu: value.maxcpu,
+                mem: value.mem,
+                maxmem: value.maxmem,
+            }
+        }
+    }
+
+    #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+    #[serde(rename_all = "kebab-case")]
+    /// Dynamic usage stats of a resource.
+    pub struct DynamicResourceStats {
+        /// CPU utilization in CPU cores.
+        pub cpu: f64,
+        /// Number of assigned CPUs or CPU limit.
+        pub maxcpu: f64,
+        /// Used memory in bytes.
+        pub mem: usize,
+        /// Maximum assigned memory in bytes.
+        pub maxmem: usize,
+    }
+
+    impl From<DynamicResourceStats> for ResourceStats {
+        fn from(value: DynamicResourceStats) -> Self {
+            Self {
+                cpu: value.cpu,
+                maxcpu: value.maxcpu,
+                mem: value.mem,
+                maxmem: value.maxmem,
+            }
+        }
+    }
+
+    type DynamicResource = PveResource<DynamicResourceStats>;
+
+    /// Class method: Create a new [`Scheduler`] instance.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::new`].
+    #[export(raw_return)]
+    pub fn new(#[raw] class: Value) -> Result<Value, Error> {
+        let inner = Usage::new();
+
+        Ok(perlmod::instantiate_magic!(
+            &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
+        ))
+    }
+
+    /// Method: Add a node with its basic CPU and memory info.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
+    #[export]
+    pub fn add_node(
+        #[try_from_ref] this: &Scheduler,
+        nodename: String,
+        stats: DynamicNodeStats,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_node(nodename, stats.into())
+    }
+
+    /// Method: Remove a node from the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
+    #[export]
+    pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.remove_node(nodename);
+    }
+
+    /// Method: Get a list of all the nodes in the scheduler.
+    #[export]
+    pub fn list_nodes(#[try_from_ref] this: &Scheduler) -> Vec<String> {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .nodenames_iter()
+            .map(|nodename| nodename.to_owned())
+            .collect()
+    }
+
+    /// Method: Check whether a node exists in the scheduler.
+    #[export]
+    pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
+        let usage = this.inner.lock().unwrap();
+
+        usage.contains_node(nodename)
+    }
+
+    /// Method: Add `resource` with identifier `sid` to the scheduler.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+    #[export]
+    pub fn add_resource(
+        #[try_from_ref] this: &Scheduler,
+        sid: String,
+        resource: DynamicResource,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.add_resource(sid, resource.into())
+    }
+
+    /// Method: Remove resource `sid` and its usage from all assigned nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
+    #[export]
+    fn remove_resource(#[try_from_ref] this: &Scheduler, sid: &str) {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.remove_resource(sid);
+    }
+
+    /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
+    #[export]
+    pub fn score_nodes_to_start_resource(
+        #[try_from_ref] this: &Scheduler,
+        resource_stats: DynamicResourceStats,
+    ) -> Result<Vec<(String, f64)>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .to_scheduler::<StartingAsStartedResourceAggregator>()
+            .score_nodes_to_start_resource(resource_stats)
+    }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index 17a8d4d..d56a423 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -1,4 +1,5 @@
 use proxmox_resource_scheduling::{
+    resource::ResourceState,
     scheduler::NodeUsage,
     usage::{Usage, UsageAggregator},
 };
@@ -31,3 +32,35 @@ impl UsageAggregator for StartedResourceAggregator {
             .collect()
     }
 }
+
+/// An aggregator, which uses the node base stats and adds any starting resources as already
+/// started resources to the node stats.
+///
+/// This aggregator is useful if starting resources should be considered in the scheduler.
+pub(crate) struct StartingAsStartedResourceAggregator;
+
+impl UsageAggregator for StartingAsStartedResourceAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| {
+                let stats = node
+                    .resources_iter()
+                    .fold(node.stats(), |mut node_stats, sid| {
+                        if let Some(resource) = usage.get_resource(sid)
+                            && resource.state() == ResourceState::Starting
+                        {
+                            node_stats.add_started_resource(&resource.stats());
+                        }
+
+                        node_stats
+                    });
+
+                NodeUsage {
+                    name: nodename.to_owned(),
+                    stats,
+                }
+            })
+            .collect()
+    }
+}
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
index a332269..3775242 100755
--- a/pve-rs/test/resource_scheduling.pl
+++ b/pve-rs/test/resource_scheduling.pl
@@ -6,6 +6,7 @@ use warnings;
 use Test::More;
 
 use PVE::RS::ResourceScheduling::Static;
+use PVE::RS::ResourceScheduling::Dynamic;
 
 my sub score_nodes {
     my ($static, $service) = @_;
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH perl-rs v3 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (14 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH cluster v3 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These methods expose the auto rebalancing methods of both the static and
dynamic scheduler.

As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
takes a possible very large list of migration candidates, the binding
takes a more compact representation, which reduces the size that needs
to be generated on the caller's side and therefore the runtime of the
serialization from Perl to Rust.

Additionally, while decomposing the compact representation the input
data is validated since the underlying scoring methods do not further
validate whether their input is consistent with the cluster usage.

The method names score_best_balancing_migration_candidates{,_topsis}()
are chosen deliberately, so that future extensions can implement
score_best_balancing_migrations{,_topsis}(), which might allow to score
migrations without providing the candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 .../resource_scheduling/pve_dynamic.rs        | 57 +++++++++++-
 .../resource_scheduling/pve_static.rs         | 56 +++++++++++-
 .../bindings/resource_scheduling/resource.rs  | 89 ++++++++++++++++++-
 .../src/bindings/resource_scheduling/usage.rs | 15 ++++
 4 files changed, 212 insertions(+), 5 deletions(-)

diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
index 27ccf39..8e066e0 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -14,10 +14,15 @@ pub mod pve_rs_resource_scheduling_dynamic {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
-    use crate::bindings::resource_scheduling::resource::PveResource;
-    use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+    use crate::bindings::resource_scheduling::resource::{
+        CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+    };
+    use crate::bindings::resource_scheduling::usage::{
+        IdentityAggregator, StartingAsStartedResourceAggregator,
+    };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
 
@@ -157,6 +162,54 @@ pub mod pve_rs_resource_scheduling_dynamic {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage.to_scheduler::<IdentityAggregator>().node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<IdentityAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index 678fccb..d83aa38 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -14,10 +14,14 @@ pub mod pve_rs_resource_scheduling_static {
     use perlmod::Value;
     use proxmox_resource_scheduling::node::NodeStats;
     use proxmox_resource_scheduling::resource::ResourceStats;
+    use proxmox_resource_scheduling::scheduler::ScoredMigration;
     use proxmox_resource_scheduling::usage::Usage;
 
     use crate::bindings::resource_scheduling::{
-        resource::PveResource, usage::StartedResourceAggregator,
+        resource::{
+            CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+        },
+        usage::StartedResourceAggregator,
     };
 
     perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
@@ -154,6 +158,56 @@ pub mod pve_rs_resource_scheduling_static {
         usage.remove_resource(sid);
     }
 
+    /// Method: Returns the load imbalance among the nodes.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+    #[export]
+    pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .node_imbalance()
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// exhaustive search.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        Ok(usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates(candidates, limit))
+    }
+
+    /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+    /// the TOPSIS method.
+    ///
+    /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+    #[export]
+    pub fn score_best_balancing_migration_candidates_topsis(
+        #[try_from_ref] this: &Scheduler,
+        candidates: Vec<CompactMigrationCandidate>,
+        limit: usize,
+    ) -> Result<Vec<ScoredMigration>, Error> {
+        let usage = this.inner.lock().unwrap();
+
+        let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+        usage
+            .to_scheduler::<StartedResourceAggregator>()
+            .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+    }
+
     /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
     ///
     /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
index 532e868..63fdd5f 100644
--- a/pve-rs/src/bindings/resource_scheduling/resource.rs
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -1,5 +1,8 @@
-use proxmox_resource_scheduling::resource::{
-    Resource, ResourcePlacement, ResourceState, ResourceStats,
+use anyhow::{Error, bail};
+use proxmox_resource_scheduling::{
+    resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+    scheduler::{Migration, MigrationCandidate},
+    usage::Usage,
 };
 
 use serde::{Deserialize, Serialize};
@@ -39,3 +42,85 @@ impl<T: Into<ResourceStats>> From<PveResource<T>> for Resource {
         Resource::new(resource.stats.into(), state, placement)
     }
 }
+
+/// A compact representation of [`proxmox_resource_scheduling::scheduler::MigrationCandidate`].
+#[derive(Serialize, Deserialize)]
+pub struct CompactMigrationCandidate {
+    /// The identifier of the leading resource.
+    pub leader: String,
+    /// The resources which are part of the leading resource's bundle.
+    pub resources: Vec<String>,
+    /// The nodes, which are possible to migrate to for the resources.
+    pub nodes: Vec<String>,
+}
+
+/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
+/// usage from `usage`.
+///
+/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
+///
+/// - the `leader` is not present in the cluster usage
+/// - the `leader` is non-stationary
+/// - any resource in `resources` is not present in the cluster usage
+/// - any resource in `resources` is non-stationary
+/// - any resource in `resources` is on another node than the `leader`
+pub(crate) fn decompose_compact_migration_candidates(
+    usage: &Usage,
+    compact_candidates: Vec<CompactMigrationCandidate>,
+) -> Result<Vec<MigrationCandidate>, Error> {
+    // The length of `compact_candidates` is at least a lower bound
+    let mut candidates = Vec::with_capacity(compact_candidates.len());
+
+    for candidate in compact_candidates.into_iter() {
+        let leader_sid = candidate.leader;
+        let leader = match usage.get_resource(&leader_sid) {
+            Some(resource) => resource,
+            _ => bail!("leader '{leader_sid}' is not present in the cluster usage"),
+        };
+        let leader_node = match leader.placement() {
+            ResourcePlacement::Stationary { current_node } => current_node,
+            _ => bail!("leader '{leader_sid}' is non-stationary"),
+        };
+
+        if !candidate.resources.contains(&leader_sid) {
+            bail!("leader '{leader_sid}' is not present in the resources list");
+        }
+
+        let mut resource_stats = Vec::with_capacity(candidate.resources.len());
+
+        for sid in candidate.resources.iter() {
+            let resource = match usage.get_resource(sid) {
+                Some(resource) => resource,
+                _ => bail!("resource '{sid}' is not present in the cluster usage"),
+            };
+
+            match resource.placement() {
+                ResourcePlacement::Stationary { current_node } => {
+                    if current_node != leader_node {
+                        bail!("resource '{sid}' is on other node than leader");
+                    }
+
+                    resource_stats.push(resource.stats());
+                }
+                _ => bail!("resource '{sid}' is non-stationary"),
+            }
+        }
+
+        let bundle_stats = resource_stats.into_iter().sum();
+
+        for target_node in candidate.nodes.into_iter() {
+            let migration = Migration {
+                sid: leader_sid.to_owned(),
+                source_node: leader_node.to_owned(),
+                target_node,
+            };
+
+            candidates.push(MigrationCandidate {
+                migration,
+                stats: bundle_stats,
+            });
+        }
+    }
+
+    Ok(candidates)
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index d56a423..e8c4ae9 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -4,6 +4,21 @@ use proxmox_resource_scheduling::{
     usage::{Usage, UsageAggregator},
 };
 
+/// The identity aggregator, which passes the node stats as-is.
+pub(crate) struct IdentityAggregator;
+
+impl UsageAggregator for IdentityAggregator {
+    fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+        usage
+            .nodes_iter()
+            .map(|(nodename, node)| NodeUsage {
+                name: nodename.to_owned(),
+                stats: node.stats(),
+            })
+            .collect()
+    }
+}
+
 /// An aggregator, which adds any resource as a started resource.
 ///
 /// This aggregator is useful if the node base stats do not have any current usage.
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH cluster v3 17/40] datacenter config: restructure verbose description for the ha crs option
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (15 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH perl-rs v3 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH cluster v3 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

This makes it a little easier to read and allows appending descriptions
for other values with a cleaner diff.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- s/is/are/ for line with "with 'static'..." (as suggested by @Dominik)

 src/PVE/DataCenterConfig.pm | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index d88b167..c275163 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -17,9 +17,12 @@ my $crs_format = {
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
-        verbose_description => "Configures how the HA manager should select nodes to start or "
-            . "recover services. With 'basic', only the number of services is used, with 'static', "
-            . "static CPU and memory configuration of services is considered.",
+        verbose_description => <<EODESC,
+Configures how the HA Manager should select nodes to start or recover services:
+
+- with 'basic', only the number of services is used,
+- with 'static', static CPU and memory configuration of services are considered.
+EODESC
     },
     'ha-rebalance-on-start' => {
         type => 'boolean',
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH cluster v3 18/40] datacenter config: add dynamic load scheduler option
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (16 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH cluster v3 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options Daniel Kral
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- s/is/are/ in dynamic line (as suggested by @Dominik)

 src/PVE/DataCenterConfig.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index c275163..0225bc6 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -13,7 +13,7 @@ my $PROXMOX_OUI = 'BC:24:11';
 my $crs_format = {
     ha => {
         type => 'string',
-        enum => ['basic', 'static'],
+        enum => ['basic', 'static', 'dynamic'],
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
@@ -21,7 +21,8 @@ my $crs_format = {
 Configures how the HA Manager should select nodes to start or recover services:
 
 - with 'basic', only the number of services is used,
-- with 'static', static CPU and memory configuration of services are considered.
+- with 'static', static CPU and memory configuration of services are considered,
+- with 'dynamic', static and dynamic CPU and memory usage of services are considered.
 EODESC
     },
     'ha-rebalance-on-start' => {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (17 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH cluster v3 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  7:52   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
                   ` (22 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- clarify unit for ha-auto-rebalance-hold-duration as suggested by
  @Jillian Morgan and @Dominik
- "The threshold for cluster node imbalance" instead of "The threshold
  for node load", where the latter is not really representative
- further improve wording a bit, so another read would be very
  appreciated!

 src/PVE/DataCenterConfig.pm | 41 +++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 0225bc6..41f56ef 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -33,6 +33,47 @@ EODESC
             "Set to use CRS for selecting a suited node when a HA services request-state"
             . " changes from stop to start.",
     },
+    'ha-auto-rebalance' => {
+        type => 'boolean',
+        optional => 1,
+        default => 0,
+        description => "Whether to use CRS for balancing HA resources automatically"
+            . " depending on the current node imbalance.",
+    },
+    'ha-auto-rebalance-threshold' => {
+        type => 'number',
+        optional => 1,
+        default => 0.7,
+        requires => 'ha-auto-rebalance',
+        description => "The threshold for the cluster node imbalance, which will"
+            . " trigger the automatic resource balancing system if its value"
+            . " is exceeded.",
+    },
+    'ha-auto-rebalance-method' => {
+        type => 'string',
+        enum => ['bruteforce', 'topsis'],
+        optional => 1,
+        default => 'bruteforce',
+        requires => 'ha-auto-rebalance',
+        description => "The method to use for the scoring of balancing migrations.",
+    },
+    'ha-auto-rebalance-hold-duration' => {
+        type => 'number',
+        optional => 1,
+        default => 3,
+        requires => 'ha-auto-rebalance',
+        description => "The number of HA rounds for which the cluster node"
+            . " imbalance threshold must be exceeded before triggering an"
+            . " automatic resource balancing migration.",
+    },
+    'ha-auto-rebalance-margin' => {
+        type => 'number',
+        optional => 1,
+        default => 0.1,
+        requires => 'ha-auto-rebalance',
+        description => "The minimum relative improvement in cluster node"
+            . " imbalance to commit to a resource balancing migration.",
+    },
 };
 
 my $migration_format = {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (18 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31 13:25   ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 21/40] sim: hardware: pass correct types for static stats Daniel Kral
                   ` (21 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Fetch the dynamic node and service stats with rrd_dump(), which is
periodically sampled and broadcasted by the PVE nodes' pvestatd service
and propagated through the pmxcfs.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- remove unused $id parameter from get_dynamic_service_stats() as
  suggested by @Thomas and @Dominik

 src/PVE/HA/Env/PVE2.pm | 63 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 04cd1bfe..b2488ddd 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -42,6 +42,19 @@ my $lockdir = "/etc/pve/priv/lock";
 # taken from PVE::Service::pvestatd::update_{lxc,qemu}_status()
 use constant {
     RRD_VM_INDEX_STATUS => 2,
+    RRD_VM_INDEX_MAXCPU => 5,
+    RRD_VM_INDEX_CPU => 6,
+    RRD_VM_INDEX_MAXMEM => 7,
+    RRD_VM_INDEX_MEM => 8,
+};
+
+# rrd entry indices for PVE nodes
+# taken from PVE::Service::pvestatd::update_node_status()
+use constant {
+    RRD_NODE_INDEX_MAXCPU => 4,
+    RRD_NODE_INDEX_CPU => 5,
+    RRD_NODE_INDEX_MAXMEM => 7,
+    RRD_NODE_INDEX_MEM => 8,
 };
 
 sub new {
@@ -569,6 +582,30 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = get_cluster_service_stats();
+    for my $sid (keys %$stats) {
+        my $id = $stats->{$sid}->{id};
+        my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
+
+        # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
+        my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
+
+        $stats->{$sid}->{usage} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -588,6 +625,32 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = {};
+    for my $key (keys %$rrd) {
+        my ($nodename) = $key =~ m/^pve-node-9.0\/(\w+)$/;
+
+        next if !$nodename;
+
+        my $rrdentry = $rrd->{$key} // [];
+
+        my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
+
+        $stats->{$nodename} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 21/40] sim: hardware: pass correct types for static stats
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (19 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 22/40] sim: hardware: factor out static stats' default values Daniel Kral
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

CRM expects f64 for cpu-related values and usize for mem-related values.
Hence, pass doubles for the former and ints for the latter.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Sim/Hardware.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 474cee16..cfcd7ab1 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -488,9 +488,9 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
+            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +507,7 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4, maxmem => 4096 } } keys %$services };
+        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
         $self->write_static_service_stats($stats);
     }
 
@@ -883,7 +883,7 @@ sub sim_hardware_cmd {
 
                 $self->set_static_service_stats(
                     $sid,
-                    { maxcpu => $params[0], maxmem => $params[1] },
+                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
                 );
 
             } elsif ($action eq 'manual-migrate') {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 22/40] sim: hardware: factor out static stats' default values
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (20 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 21/40] sim: hardware: pass correct types for static stats Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 23/40] sim: hardware: fix static stats guard Daniel Kral
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Sim/Hardware.pm | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index cfcd7ab1..34d67754 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,6 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_maxcpu = 4.0;
+my $default_service_maxmem = 4096 * 1024**2;
+my $default_node_maxcpu = 24.0;
+my $default_node_maxmem = 131072 * 1024**2;
+
 # Status directory layout
 #
 # configuration
@@ -488,9 +493,24 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node1 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node2 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node3 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +527,12 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
+        my $stats = {
+            map {
+                $_ => { maxcpu => $default_service_maxcpu, maxmem => $default_service_maxmem }
+            }
+                keys %$services
+        };
         $self->write_static_service_stats($stats);
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 23/40] sim: hardware: fix static stats guard
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (21 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 22/40] sim: hardware: factor out static stats' default values Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 24/40] sim: hardware: handle dynamic service stats Daniel Kral
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

While falsy, values of 0 or 0.0 are valid stats. Hence, use
'defined'-check to avoid skipping falsy static service stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Sim/Hardware.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 34d67754..f6c3d902 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -202,11 +202,11 @@ sub set_static_service_stats {
 
     my $stats = $self->read_static_service_stats();
 
-    if (my $memory = $new_stats->{maxmem}) {
+    if (defined(my $memory = $new_stats->{maxmem})) {
         $stats->{$sid}->{maxmem} = $memory;
     }
 
-    if (my $cpu = $new_stats->{maxcpu}) {
+    if (defined(my $cpu = $new_stats->{maxcpu})) {
         $stats->{$sid}->{maxcpu} = $cpu;
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 24/40] sim: hardware: handle dynamic service stats
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (22 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 23/40] sim: hardware: fix static stats guard Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

This adds functionality to simulate dynamic stats of a service, that is,
cpu load (cores) and memory usage (MiB).

Analogous to static service stats, within tests, dynamic service stats
can be specified in file dynamic_service_stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Sim/Hardware.pm | 52 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index f6c3d902..3dda0c0f 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,8 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_cpu = 2.0;
 my $default_service_maxcpu = 4.0;
+my $default_service_mem = 2048 * 1024**2;
 my $default_service_maxmem = 4096 * 1024**2;
+
 my $default_node_maxcpu = 24.0;
 my $default_node_maxmem = 131072 * 1024**2;
 
@@ -213,6 +216,25 @@ sub set_static_service_stats {
     $self->write_static_service_stats($stats);
 }
 
+sub set_dynamic_service_stats {
+    my ($self, $sid, $new_stats) = @_;
+
+    my $conf = $self->read_service_config();
+    die "no such service '$sid'" if !$conf->{$sid};
+
+    my $stats = $self->read_dynamic_service_stats();
+
+    if (defined(my $memory = $new_stats->{mem})) {
+        $stats->{$sid}->{mem} = $memory;
+    }
+
+    if (defined(my $cpu = $new_stats->{cpu})) {
+        $stats->{$sid}->{cpu} = $cpu;
+    }
+
+    $self->write_dynamic_service_stats($stats);
+}
+
 sub add_service {
     my ($self, $sid, $opts, $running) = @_;
 
@@ -438,6 +460,16 @@ sub read_static_service_stats {
     return $stats;
 }
 
+sub read_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
+    $self->log('error', "loading dynamic service stats failed - $@") if $@;
+
+    return $stats;
+}
+
 sub write_static_service_stats {
     my ($self, $stats) = @_;
 
@@ -446,6 +478,14 @@ sub write_static_service_stats {
     $self->log('error', "writing static service stats failed - $@") if $@;
 }
 
+sub write_dynamic_service_stats {
+    my ($self, $stats) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    eval { PVE::HA::Tools::write_json_to_file($filename, $stats) };
+    $self->log('error', "writing dynamic service stats failed - $@") if $@;
+}
+
 sub new {
     my ($this, $testdir) = @_;
 
@@ -536,6 +576,18 @@ sub new {
         $self->write_static_service_stats($stats);
     }
 
+    if (-f "$testdir/dynamic_service_stats") {
+        copy("$testdir/dynamic_service_stats", "$statusdir/dynamic_service_stats");
+    } else {
+        my $services = $self->read_static_service_stats();
+        my $stats = {
+            map { $_ => { cpu => $default_service_cpu, mem => $default_service_mem } }
+                keys %$services
+        };
+
+        $self->write_dynamic_service_stats($stats);
+    }
+
     my $cstatus = $self->read_hardware_status_nolock();
 
     foreach my $node (sort keys %$cstatus) {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 25/40] sim: hardware: add set-dynamic-stats command
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (23 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 24/40] sim: hardware: handle dynamic service stats Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Add command to set dynamic service stats and handle respective commands
set-dynamic-stats and set-static-stats analogously.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- rebase on master to include new crm command 'manual-migrate'

 src/PVE/HA/Sim/Hardware.pm   | 34 ++++++++++++++++++++++++++--------
 src/PVE/HA/Sim/RTHardware.pm |  4 +++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 3dda0c0f..b4000cfd 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -795,7 +795,8 @@ sub get_cfs_state {
 #   service <sid> stop <timeout>
 #   service <sid> lock/unlock [lockname]
 #   service <sid> add <node> [<request-state=started>] [<running=0>]
-#   service <sid> set-static-stats <maxcpu> <maxmem>
+#   service <sid> set-static-stats  [maxcpu <cores>] [maxmem <MiB>]
+#   service <sid> set-dynamic-stats [cpu <cores>] [mem <MiB>]
 #   service <sid> delete
 sub sim_hardware_cmd {
     my ($self, $cmdstr, $logid) = @_;
@@ -954,15 +955,32 @@ sub sim_hardware_cmd {
                     $params[2] || 0,
                 );
 
-            } elsif ($action eq 'set-static-stats') {
-                die "sim_hardware_cmd: missing maxcpu for '$action' command" if !$params[0];
-                die "sim_hardware_cmd: missing maxmem for '$action' command" if !$params[1];
+            } elsif ($action eq 'set-static-stats' || $action eq 'set-dynamic-stats') {
+                die "sim_hardware_cmd: missing target stat for '$action' command"
+                    if !@params;
 
-                $self->set_static_service_stats(
-                    $sid,
-                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
-                );
+                my $conversions =
+                    $action eq 'set-static-stats'
+                    ? { maxcpu => sub { 0.0 + $_[0] }, maxmem => sub { $_[0] * 1024**2 } }
+                    : { cpu => sub { 0.0 + $_[0] }, mem => sub { $_[0] * 1024**2 } };
 
+                my %new_stats;
+                for my ($target, $val) (@params) {
+                    die "sim_hardware_cmd: missing value for '$action $target' command"
+                        if !defined($val);
+
+                    my $convert = $conversions->{$target}
+                        or die
+                        "sim_hardware_cmd: unknown target stat '$target' for '$action' command";
+
+                    $new_stats{$target} = $convert->($val);
+                }
+
+                if ($action eq 'set-static-stats') {
+                    $self->set_static_service_stats($sid, \%new_stats);
+                } else {
+                    $self->set_dynamic_service_stats($sid, \%new_stats);
+                }
             } elsif ($action eq 'manual-migrate') {
 
                 die "sim_hardware_cmd: missing target node for '$action' command"
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 9a83d098..9528f542 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -532,7 +532,9 @@ sub show_service_add_dialog {
 
         my $maxcpu = $cpu_count_spin->get_value();
         my $maxmem = $memory_spin->get_value();
-        $self->sim_hardware_cmd("service $sid set-static-stats $maxcpu $maxmem", 'command');
+        $self->sim_hardware_cmd(
+            "service $sid set-static-stats maxcpu $maxcpu maxmem $maxmem", 'command',
+        );
 
         $self->add_service_to_gui($sid);
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 26/40] sim: hardware: add getters for dynamic {node,service} stats
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (24 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 27/40] usage: pass service data to add_service_usage Daniel Kral
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Aggregation of dynamic node stats is lazy.

Getters log on warning level in case of overcommitted stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- add comment about get_dynamic_node_stats() excluding non-running
  resources from the summation

 src/PVE/HA/Sim/Env.pm      | 12 ++++++++
 src/PVE/HA/Sim/Hardware.pm | 61 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index ad51245c..65d4efad 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -500,12 +500,24 @@ sub get_static_service_stats {
     return $self->{hardware}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{hardware}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index b4000cfd..5693df0f 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1232,6 +1232,27 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $stats = get_cluster_service_stats($self);
+    my $static_stats = $self->read_static_service_stats();
+    my $dynamic_stats = $self->read_dynamic_service_stats();
+
+    for my $sid (keys %$stats) {
+        $stats->{$sid}->{usage} = {
+            $static_stats->{$sid}->%*, $dynamic_stats->{$sid}->%*,
+        };
+
+        $self->log('warning', "overcommitted cpu on '$sid'")
+            if $stats->{$sid}->{usage}->{cpu} > $stats->{$sid}->{usage}->{maxcpu};
+        $self->log('warning', "overcommitted mem on '$sid'")
+            if $stats->{$sid}->{usage}->{mem} > $stats->{$sid}->{usage}->{maxmem};
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -1245,6 +1266,46 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $stats = $self->get_static_node_stats();
+    for my $node (keys %$stats) {
+        $stats->{$node}->{maxcpu} = $stats->{$node}->{maxcpu} // $default_node_maxcpu;
+        $stats->{$node}->{cpu} = $stats->{$node}->{cpu} // 0.0;
+        $stats->{$node}->{maxmem} = $stats->{$node}->{maxmem} // $default_node_maxmem;
+        $stats->{$node}->{mem} = $stats->{$node}->{mem} // 0;
+    }
+
+    my $service_conf = $self->read_service_config();
+    my $dynamic_service_stats = $self->get_dynamic_service_stats();
+
+    my $cstatus = $self->read_hardware_status_nolock();
+    my $node_service_status = { map { $_ => $self->read_service_status($_) } keys %$cstatus };
+
+    for my $sid (keys %$service_conf) {
+        my $node = $service_conf->{$sid}->{node};
+
+        # only add the dynamic load usage to node if service is actually marked
+        # as running by the node service status written by the LRM
+        if ($node_service_status->{$node}->{$sid}) {
+            my ($cpu, $mem) = $dynamic_service_stats->{$sid}->{usage}->@{qw(cpu mem)};
+
+            die "unknown cpu load for '$sid'" if !defined($cpu);
+            $stats->{$node}->{cpu} += $cpu;
+            $self->log('warning', "overcommitted cpu on '$node'")
+                if $stats->{$node}->{cpu} > $stats->{$node}->{maxcpu};
+
+            die "unknown memory usage for '$sid'" if !defined($mem);
+            $stats->{$node}->{mem} += $mem;
+            $self->log('warning', "overcommitted mem on '$node'")
+                if $stats->{$node}->{mem} > $stats->{$node}->{maxmem};
+        }
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 27/40] usage: pass service data to add_service_usage
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (25 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The method is already dependent on three members of the service data and
in a following patch a fourth member is needed for adding more
information to the Usage implementations.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- fix $service_stats->%{qw(...)} to $service_stats->{$sid}->%{qw(...)}
  as reported by @Dominik

 src/PVE/HA/Manager.pm | 11 +++++------
 src/PVE/HA/Usage.pm   |  6 +++---
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index fbc7f931..71f45b5c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -284,17 +284,17 @@ sub recompute_online_node_usage {
     foreach my $sid (sort keys %{ $self->{ss} }) {
         my $sd = $self->{ss}->{$sid};
 
-        $online_node_usage->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     # add remaining non-HA resources to online node usage
     for my $sid (sort keys %$service_stats) {
         next if $self->{ss}->{$sid};
 
-        my ($node, $state) = $service_stats->{$sid}->@{qw(node state)};
-
         # the migration target is not known for non-HA resources
-        $online_node_usage->add_service_usage($sid, $state, $node, undef);
+        my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     $self->{online_node_usage} = $online_node_usage;
@@ -332,8 +332,7 @@ my $change_service_state = sub {
     }
 
     $self->{online_node_usage}->remove_service_usage($sid);
-    $self->{online_node_usage}
-        ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+    $self->{online_node_usage}->add_service_usage($sid, $sd);
 
     $sd->{uid} = compute_new_uuid($new_state);
 
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 9f19a82b..6d53f956 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -40,12 +40,12 @@ sub add_service_usage_to_node {
     die "implement in subclass";
 }
 
-# Adds service $sid's usage to the online nodes according to their $state,
-# $service_node and $migration_target.
+# Adds service $sid's usage to the online nodes according to their service data $sd.
 sub add_service_usage {
-    my ($self, $sid, $service_state, $service_node, $migration_target) = @_;
+    my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
+    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
     my ($current_node, $target_node) =
         get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 28/40] usage: pass service data to get_used_service_nodes
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (26 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 27/40] usage: pass service data to add_service_usage Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats Daniel Kral
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Remove some unnecessary destructuring syntax for the helper.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- reword documentation of get_used_service_nodes() as suggested by
  @Dominik

 src/PVE/HA/Rules/ResourceAffinity.pm |  3 +--
 src/PVE/HA/Usage.pm                  | 13 ++++++-------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 1c610430..474d3000 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -511,8 +511,7 @@ sub get_resource_affinity {
     my $get_used_service_nodes = sub {
         my ($sid) = @_;
         return (undef, undef) if !defined($ss->{$sid});
-        my ($state, $node, $target) = $ss->{$sid}->@{qw(state node target)};
-        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $state, $node, $target);
+        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $ss->{$sid});
     };
 
     for my $csid (keys $positive->%*) {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 6d53f956..be3e64d6 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -45,9 +45,7 @@ sub add_service_usage {
     my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
-    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
-    my ($current_node, $target_node) =
-        get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
+    my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
     $self->add_service_usage_to_node($current_node, $sid) if $current_node;
     $self->add_service_usage_to_node($target_node, $sid) if $target_node;
@@ -66,11 +64,12 @@ sub score_nodes_to_start_service {
     die "implement in subclass";
 }
 
-# Returns the current and target node as a two-element array, that a service
-# puts load on according to the $online_nodes and the service's $state, $node
-# and $target.
+# Returns a two-element array of the nodes a service puts load on
+# (current and target), given $online_nodes and service data $sd.
 sub get_used_service_nodes {
-    my ($online_nodes, $state, $node, $target) = @_;
+    my ($online_nodes, $sd) = @_;
+
+    my ($state, $node, $target) = $sd->@{qw(state node target)};
 
     return (undef, undef) if $state eq 'stopped' || $state eq 'request_start';
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (27 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  7:58   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes Daniel Kral
                   ` (12 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The running flag is needed to discriminate starting and started
resources from each other, which is a required parameter for using the
new add_service(...) method for the resource scheduling bindings.

The HA Manager tracks whether HA resources are in 'started' state and
whether the LRM acknowledged that these are running. For non-HA
resources, the rrd_dump data contains a running flag for VM and CT
guests.

See the next patch for the usage implementations, which passes the
running flag to the add_service(...) method, for more information about
the details.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- change the patch message and summary such that is clear that this
  patch is only really relevant to make non-HA cluster resources also
  have the running flag

 src/PVE/HA/Env/PVE2.pm     | 1 +
 src/PVE/HA/Manager.pm      | 2 +-
 src/PVE/HA/Sim/Hardware.pm | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index b2488ddd..97291594 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -549,6 +549,7 @@ my sub get_cluster_service_stats {
             id => $id,
             node => $nodename,
             state => $state,
+            running => $state eq 'started',
             type => $type,
             usage => {},
         };
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 71f45b5c..5b2715c7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -292,7 +292,7 @@ sub recompute_online_node_usage {
         next if $self->{ss}->{$sid};
 
         # the migration target is not known for non-HA resources
-        my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+        my $sd = { $service_stats->{$sid}->%{qw(node state running)} };
 
         $online_node_usage->add_service_usage($sid, $sd);
     }
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 5693df0f..986cb084 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1201,6 +1201,7 @@ my sub get_cluster_service_stats {
         $stats->{$sid} = {
             node => $cfg->{node},
             state => $cfg->{state},
+            running => $cfg->{state} eq 'started',
             usage => {},
         };
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (28 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  8:12   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler Daniel Kral
                   ` (11 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The pve_static (and upcoming pve_dynamic) bindings expose the new
add_resource(...) method, which allow adding resources in a single call
with the additional running flag.

The running flag is needed to discriminate starting and started HA
resources from each other, which is needed to correctly account for HA
resources for the dynamic load usage implementation in the next patch.

This is because for the dynamic load usage, any HA resource, which is
scheduled to start by the HA Manager in the same round, will not be
accounted for in the next call to score_nodes_to_start_resource(...).
This is not a problem for the static load usage, because there the
current node usages are derived from the started resources on every
call already.

Passing only the HA resources' 'state' property is not enough since the
HA Manager will move any HA resource from the 'request_start' (or
through other transient states such as 'request_start_balance' and a
successful 'migrate'/'relocate') into the 'started' state.

This 'started' state is then picked up by the HA resource's LRM, which
will actually start the HA resource and if successful respond with a
'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
adding the running flag to the HA resource's state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- fix setting $running flag only if state 'started' and running is set
  or for any non-started state where current_node is set
- Additionally handle the case where only $target_node is set in
  add_service(), which can only happen in specific cases; I have some
  patches which inline this behavior in get_used_service_nodes() (should
  be named something else later) to make this behavior more consise, but
  that should be handled separately
- change the $service property names to kebab-case

 src/PVE/HA/Usage.pm        | 13 ++++++++-----
 src/PVE/HA/Usage/Basic.pm  |  9 ++++++++-
 src/PVE/HA/Usage/Static.pm | 30 ++++++++++++++++++++++++------
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index be3e64d6..43feb041 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,9 +33,8 @@ sub contains_node {
     die "implement in subclass";
 }
 
-# Logs a warning to $haenv upon failure, but does not die.
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
     die "implement in subclass";
 }
@@ -47,8 +46,12 @@ sub add_service_usage {
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
     my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
-    $self->add_service_usage_to_node($current_node, $sid) if $current_node;
-    $self->add_service_usage_to_node($target_node, $sid) if $target_node;
+    # some usage implementations need to discern whether a service is truly running;
+    # a service does only have the 'running' flag in 'started' state
+    my $running = ($sd->{state} eq 'started' && $sd->{running})
+        || ($sd->{state} ne 'started' && defined($current_node));
+
+    $self->add_service($sid, $current_node, $target_node, $running);
 }
 
 sub remove_service_usage {
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 2584727b..5aa3ac05 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -38,7 +38,7 @@ sub contains_node {
     return defined($self->{nodes}->{$nodename});
 }
 
-sub add_service_usage_to_node {
+my sub add_service_usage_to_node {
     my ($self, $nodename, $sid) = @_;
 
     if ($self->contains_node($nodename)) {
@@ -51,6 +51,13 @@ sub add_service_usage_to_node {
     }
 }
 
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    add_service_usage_to_node($self, $current_node, $sid) if defined($current_node);
+    add_service_usage_to_node($self, $target_node, $sid) if defined($target_node);
+}
+
 sub remove_service_usage {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index b60f5000..8c7a614b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -71,17 +71,35 @@ my sub get_service_usage {
     return $service_stats;
 }
 
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
-    $self->{'node-services'}->{$nodename}->{$sid} = 1;
+    # do not add service which do not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
+
+    # PVE::RS::ResourceScheduling::Static::add_service() expects $current_node
+    # to be set, so consider $target_node as $current_node for unset $current_node;
+    #
+    # currently, this happens for the request_start_balance service state and if
+    # node maintenance causes services to migrate to other nodes
+    if (!defined($current_node)) {
+        $current_node = $target_node;
+        undef $target_node;
+    }
 
     eval {
         my $service_usage = get_service_usage($self, $sid);
-        $self->{scheduler}->add_service_usage_to_node($nodename, $sid, $service_usage);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            'current-node' => $current_node,
+            'target-node' => $target_node,
+        };
+
+        $self->{scheduler}->add_service($sid, $service);
     };
-    $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
-        if $@;
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
 }
 
 sub remove_service_usage {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (29 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  8:15   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases Daniel Kral
                   ` (10 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The dynamic usage scheduler allows the HA Manager to make scheduling
decisions based on the current usage of the nodes and cluster resources
in addition to the maximum usage stats as reported by the PVE::HA::Env
implementation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- add !defined(...) guards in PVE::HA::Usage::Dynamic::add_node(...) as
  suggested by @Dominik
- adapt comment in add_service() as suggested by @Dominik
- did not change error messages as suggested by @Dominik, because these
  are consistent with the one for Static; should be done in a separate
  patch (series)

 debian/pve-ha-manager.install |   1 +
 src/PVE/HA/Env.pm             |  12 ++++
 src/PVE/HA/Manager.pm         |  21 ++++++
 src/PVE/HA/Usage/Dynamic.pm   | 122 ++++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Makefile     |   2 +-
 5 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 38d5d60b..75220a0b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -42,6 +42,7 @@
 /usr/share/perl5/PVE/HA/Usage.pm
 /usr/share/perl5/PVE/HA/Usage/Basic.pm
 /usr/share/perl5/PVE/HA/Usage/Static.pm
+/usr/share/perl5/PVE/HA/Usage/Dynamic.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
 /usr/share/pve-manager/templates/default/fencing-body.html.hbs
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 3643292e..44c26854 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -312,12 +312,24 @@ sub get_static_service_stats {
     return $self->{plug}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{plug}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 5b2715c7..c60ab595 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -21,6 +21,12 @@ eval {
     $have_static_scheduling = 1;
 };
 
+my $have_dynamic_scheduling;
+eval {
+    require PVE::HA::Usage::Dynamic;
+    $have_dynamic_scheduling = 1;
+};
+
 ## Variable Name & Abbreviations Convention
 #
 # The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -267,6 +273,21 @@ sub recompute_online_node_usage {
                 'warning',
                 "fallback to 'basic' scheduler mode, init for 'static' failed - $@",
             ) if $@;
+        } elsif ($mode eq 'dynamic') {
+            if ($have_dynamic_scheduling) {
+                $online_node_usage = eval {
+                    $service_stats = $haenv->get_dynamic_service_stats();
+                    my $scheduler = PVE::HA::Usage::Dynamic->new($haenv, $service_stats);
+                    $scheduler->add_node($_) for $online_nodes->@*;
+                    return $scheduler;
+                };
+            } else {
+                $@ = "dynamic scheduling not available\n";
+            }
+            $haenv->log(
+                'warning',
+                "fallback to 'basic' scheduler mode, init for 'dynamic' failed - $@",
+            ) if $@;
         } elsif ($mode eq 'basic') {
             # handled below in the general fall-back case
         } else {
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
new file mode 100644
index 00000000..24c85a41
--- /dev/null
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -0,0 +1,122 @@
+package PVE::HA::Usage::Dynamic;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Dynamic;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv, $service_stats) = @_;
+
+    my $node_stats = eval { $haenv->get_dynamic_node_stats() };
+    die "did not get dynamic node usage information - $@" if $@;
+
+    my $scheduler = eval { PVE::RS::ResourceScheduling::Dynamic->new() };
+    die "unable to initialize dynamic scheduling - $@" if $@;
+
+    return bless {
+        'node-stats' => $node_stats,
+        'service-stats' => $service_stats,
+        haenv => $haenv,
+        scheduler => $scheduler,
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    my $stats = $self->{'node-stats'}->{$nodename}
+        or die "did not get dynamic node usage information for '$nodename'\n";
+    die "dynamic node usage information for '$nodename' missing cpu count\n"
+        if !defined($stats->{maxcpu});
+    die "dynamic node usage information for '$nodename' missing memory\n"
+        if !defined($stats->{maxmem});
+
+    eval { $self->{scheduler}->add_node($nodename, $stats); };
+    die "initializing dynamic node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+    my ($self, $sid) = @_;
+
+    my $service_stats = $self->{'service-stats'}->{$sid}->{usage}
+        or die "did not get dynamic service usage information for '$sid'\n";
+
+    return $service_stats;
+}
+
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    # do not add service, which does not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
+
+    # PVE::RS::ResourceScheduling::Dynamic::add_resource() expects $current_node
+    # to be set, so consider $target_node as $current_node for unset $current_node;
+    #
+    # currently, this happens for the request_start_balance service state and if
+    # node maintenance causes services to migrate to other nodes
+    if (!defined($current_node)) {
+        $current_node = $target_node;
+        undef $target_node;
+    }
+
+    eval {
+        my $service_usage = get_service_usage($self, $sid);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            'current-node' => $current_node,
+            'target-node' => $target_node,
+        };
+
+        $self->{scheduler}->add_resource($sid, $service);
+    };
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
+}
+
+sub remove_service_usage {
+    my ($self, $sid) = @_;
+
+    eval { $self->{scheduler}->remove_resource($sid) };
+    $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid) = @_;
+
+    my $score_list = eval {
+        my $service_usage = get_service_usage($self, $sid);
+        $self->{scheduler}->score_nodes_to_start_resource($service_usage);
+    };
+    $self->{haenv}
+        ->log('err', "unable to score nodes according to dynamic usage for service '$sid' - $@")
+        if $@;
+
+    # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+    return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index befdda60..5d51a9c1 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,5 +1,5 @@
 SIM_SOURCES=Basic.pm
-SOURCES=${SIM_SOURCES} Static.pm
+SOURCES=${SIM_SOURCES} Static.pm Dynamic.pm
 
 .PHONY: install
 install:
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (30 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  8:20   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
                   ` (9 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the scheduler using the
dynamic usage information of the HA resources with rebalance-on-start
being cleared and set respectively.

As the mechanisms for the scheduler with static and dynamic usage
information are mostly the same, these test cases verify only the
essential parts, which are:

- dynamic usage information is used correctly (for both test cases), and
- repeatedly scheduling resources with score_nodes_to_start_service(...)
  correctly simulates that the previously scheduled HA resources are
  already started

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- adapt the test-crs-dynamic-rebalance1 test case to correctly handle
  the change in score_nodes_to_start_resource() as done by changing how
  $running is set in add_service_usage() a few patches prior

 src/test/test-crs-dynamic-rebalance1/README   |  3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |  4 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  7 ++
 .../hardware_status                           |  5 ++
 .../test-crs-dynamic-rebalance1/log.expect    | 82 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  7 ++
 .../static_service_stats                      |  7 ++
 src/test/test-crs-dynamic1/README             |  4 +
 src/test/test-crs-dynamic1/cmdlist            |  4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |  6 ++
 .../test-crs-dynamic1/dynamic_service_stats   |  3 +
 src/test/test-crs-dynamic1/hardware_status    |  5 ++
 src/test/test-crs-dynamic1/log.expect         | 51 ++++++++++++
 src/test/test-crs-dynamic1/manager_status     |  1 +
 src/test/test-crs-dynamic1/service_config     |  3 +
 .../test-crs-dynamic1/static_service_stats    |  3 +
 18 files changed, 203 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats

diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README
new file mode 100644
index 00000000..df0ba0a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/README
@@ -0,0 +1,3 @@
+Test rebalancing on start and how after a failed node the recovery gets
+balanced out for a small batch of HA resources with the dynamic usage
+information.
diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..0f76d24e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-rebalance-on-start": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..5ef75ae0
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "cpu": 1.3, "mem": 1073741824 },
+    "vm:102": { "cpu": 5.6, "mem": 3221225472 },
+    "vm:103": { "cpu": 0.5, "mem": 4000000000 },
+    "vm:104": { "cpu": 7.9, "mem": 2147483648 },
+    "vm:105": { "cpu": 3.2, "mem": 2684354560 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status
new file mode 100644
index 00000000..bfdbbf7b
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect
new file mode 100644
index 00000000..5c8b050c
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/log.expect
@@ -0,0 +1,82 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: service vm:101: re-balance selected new node node1 for startup
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node1)
+info     20    node1/crm: service vm:102: re-balance selected new node node2 for startup
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node2)
+info     20    node1/crm: service vm:103: re-balance selected current node node3 for startup
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service vm:104: re-balance selected new node node1 for startup
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node1)
+info     20    node1/crm: service vm:105: re-balance selected new node node2 for startup
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:101 - start relocate to node 'node1'
+info     25    node3/lrm: service vm:101 - end relocate to node 'node1'
+info     25    node3/lrm: service vm:102 - start relocate to node 'node2'
+info     25    node3/lrm: service vm:102 - end relocate to node 'node2'
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info     25    node3/lrm: service vm:104 - start relocate to node 'node1'
+info     25    node3/lrm: service vm:104 - end relocate to node 'node1'
+info     25    node3/lrm: service vm:105 - start relocate to node 'node2'
+info     25    node3/lrm: service vm:105 - end relocate to node 'node2'
+info     40    node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started'  (node = node2)
+info     40    node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started'  (node = node2)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     41    node1/lrm: starting service vm:104
+info     41    node1/lrm: service status vm:104 started
+info     43    node2/lrm: starting service vm:102
+info     43    node2/lrm: service status vm:102 started
+info     43    node2/lrm: starting service vm:105
+info     43    node2/lrm: service status vm:105 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:103
+info    241    node1/lrm: service status vm:103 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config
new file mode 100644
index 00000000..3071f480
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/service_config
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats
new file mode 100644
index 00000000..a9e810d7
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:103": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:104": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:105": { "maxcpu": 8, "maxmem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README
new file mode 100644
index 00000000..e6382130
--- /dev/null
+++ b/src/test/test-crs-dynamic1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with dynamic usage information.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist
new file mode 100644
index 00000000..8684073c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg
new file mode 100644
index 00000000..6a7fbc48
--- /dev/null
+++ b/src/test/test-crs-dynamic1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+    "crs": {
+        "ha": "dynamic"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats
new file mode 100644
index 00000000..922ae9a6
--- /dev/null
+++ b/src/test/test-crs-dynamic1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "cpu": 5.9, "mem": 2744123392 }
+}
diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status
new file mode 100644
index 00000000..bbe44a96
--- /dev/null
+++ b/src/test/test-crs-dynamic1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 }
+}
diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect
new file mode 100644
index 00000000..b7e298e1
--- /dev/null
+++ b/src/test/test-crs-dynamic1/log.expect
@@ -0,0 +1,51 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute network node1 off
+info    120    node1/crm: status change master => lost_manager_lock
+info    120    node1/crm: status change lost_manager_lock => wait_for_quorum
+info    121    node1/lrm: status change active => lost_agent_lock
+info    162     watchdog: execute power node1 off
+info    161    node1/crm: killed by poweroff
+info    162    node1/lrm: killed by poweroff
+info    162     hardware: server 'node1' stopped by poweroff (watchdog)
+info    222    node3/crm: got lock 'ha_manager_lock'
+info    222    node3/crm: status change slave => master
+info    222    node3/crm: using scheduler mode 'dynamic'
+info    222    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info    282    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    282    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai    282    node3/crm: FENCE: Try to fence node 'node1'
+info    282    node3/crm: got lock 'ha_agent_node1_lock'
+info    282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai    282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info    282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    283    node3/lrm: got lock 'ha_agent_node3_lock'
+info    283    node3/lrm: status change wait_for_agent_lock => active
+info    283    node3/lrm: starting service vm:102
+info    283    node3/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config
new file mode 100644
index 00000000..9c124471
--- /dev/null
+++ b/src/test/test-crs-dynamic1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats
new file mode 100644
index 00000000..1819d24c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 33/40] manager: rename execute_migration to queue_resource_motion
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (31 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

The name is misleading, because the HA resource migration is not
executed, but only queues the HA resource to change into the state
'migrate' or 'relocate', which is then picked up by the respective LRM
to execute.

The term 'resource motion' also generalizes the different actions
implied by the 'migrate' and 'relocate' command and state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Manager.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c60ab595..c8a1a35b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -419,7 +419,7 @@ sub read_lrm_status {
     return ($results, $modes);
 }
 
-sub execute_migration {
+sub queue_resource_motion {
     my ($self, $cmd, $task, $sid, $target) = @_;
 
     my ($haenv, $ss) = $self->@{qw(haenv ss)};
@@ -488,7 +488,7 @@ sub update_crm_commands {
                             "ignore crm command - service already on target node: $cmd",
                         );
                     } else {
-                        $self->execute_migration($cmd, $task, $sid, $node);
+                        $self->queue_resource_motion($cmd, $task, $sid, $node);
                     }
                 }
             } else {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 34/40] manager: update_crs_scheduler_mode: factor out crs config
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (32 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Manager.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c8a1a35b..2576c762 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -94,11 +94,12 @@ sub update_crs_scheduler_mode {
 
     my $haenv = $self->{haenv};
     my $dc_cfg = $haenv->get_datacenter_settings();
+    my $crs_cfg = $dc_cfg->{crs};
 
-    $self->{crs}->{rebalance_on_request_start} = !!$dc_cfg->{crs}->{'ha-rebalance-on-start'};
+    $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
 
     my $old_mode = $self->{crs}->{scheduler};
-    my $new_mode = $dc_cfg->{crs}->{ha} || 'basic';
+    my $new_mode = $crs_cfg->{ha} || 'basic';
 
     if (!defined($old_mode)) {
         $haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (33 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  9:07   ` Dominik Rusovac
                     ` (2 more replies)
  2026-03-30 14:30 ` [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases Daniel Kral
                   ` (6 subsequent siblings)
  41 siblings, 3 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

If the automatic load balancing system is enabled, it checks whether the
cluster node imbalance exceeds some user-defined threshold for some HA
Manager rounds ("hold duration"). If it does exceed on consecutive HA
Manager rounds, it will choose the best resource motion to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined imbalance improvement ("margin").

This patch introduces resource bundles, which ensure that HA resources
in strict positive resource affinity rules are considered as a whole
"bundle" instead of individual HA resources.

Specifically, active and stationary resource bundles are resource
bundles, that have at least one resource running and all resources
located on the same node. This distinction is needed as newly created
strict positive resource affinity rules may still require some resource
motions to enforce the rule.

Additionally, the migration candidate generation prunes any target
nodes, which do not adhere to the HA rules of these resource bundles
before scoring these migration candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/PVE/HA/Manager.pm       | 179 +++++++++++++++++++++++++++++++++++-
 src/PVE/HA/Usage.pm         |  34 +++++++
 src/PVE/HA/Usage/Dynamic.pm |  33 +++++++
 src/PVE/HA/Usage/Static.pm  |  33 +++++++
 4 files changed, 278 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 2576c762..0f8a03a6 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -59,10 +59,17 @@ sub new {
 
     my $self = bless {
         haenv => $haenv,
-        crs => {},
+        crs => {
+            auto_rebalance => {},
+        },
         last_rules_digest => '',
         last_groups_digest => '',
         last_services_digest => '',
+        # used to track how many HA rounds the imbalance threshold has been exceeded
+        #
+        # this is not persisted for a CRM failover as in the mean time
+        # the usage statistics might have change quite a bit already
+        sustained_imbalance_round => 0,
         group_migration_round => 3, # wait a little bit
     }, $class;
 
@@ -97,6 +104,13 @@ sub update_crs_scheduler_mode {
     my $crs_cfg = $dc_cfg->{crs};
 
     $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
+    $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
+    $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.7;
+    $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
+        // 'bruteforce';
+    $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
+        // 3;
+    $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
 
     my $old_mode = $self->{crs}->{scheduler};
     my $new_mode = $crs_cfg->{ha} || 'basic';
@@ -114,6 +128,150 @@ sub update_crs_scheduler_mode {
     return;
 }
 
+# Returns a hash of lists, which contain the running, non-moving HA resource
+# bundles, which are on the same node, implied by the strict positive resource
+# affinity rules.
+#
+# Each resource bundle has a leader, which is the alphabetically first running
+# HA resource in the resource bundle and also the key of each resource bundle
+# in the returned hash.
+sub get_active_stationary_resource_bundles {
+    my ($ss, $resource_affinity) = @_;
+
+    my $resource_bundles = {};
+OUTER: for my $sid (sort keys %$ss) {
+        # do not consider non-started resource as 'active' leading resource
+        next if $ss->{$sid}->{state} ne 'started';
+
+        my @resources = ($sid);
+        my $nodes = { $ss->{$sid}->{node} => 1 };
+
+        my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
+        if (%$dependent_resources) {
+            for my $csid (keys %$dependent_resources) {
+                next if !defined($ss->{$csid});
+                my ($state, $node) = $ss->{$csid}->@{qw(state node)};
+
+                # do not consider stationary bundle if a dependent resource moves
+                next OUTER if $state eq 'migrate' || $state eq 'relocate';
+                # do not add non-started resource to active bundle
+                next if $state ne 'started';
+
+                $nodes->{$node} = 1;
+
+                push @resources, $csid;
+            }
+
+            @resources = sort @resources;
+        }
+
+        # skip resource bundles, which are not on the same node yet
+        next if keys %$nodes > 1;
+
+        my $leader_sid = $resources[0];
+
+        $resource_bundles->{$leader_sid} = \@resources;
+    }
+
+    return $resource_bundles;
+}
+
+# Returns a hash of hashes, where each item contains the resource bundle's
+# leader, the list of HA resources in the resource bundle, and the list of
+# possible nodes to migrate to.
+sub get_resource_migration_candidates {
+    my ($self) = @_;
+
+    my ($ss, $compiled_rules, $online_node_usage) =
+        $self->@{qw(ss compiled_rules online_node_usage)};
+    my ($node_affinity, $resource_affinity) =
+        $compiled_rules->@{qw(node-affinity resource-affinity)};
+
+    my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    my @compact_migration_candidates = ();
+    for my $leader_sid (sort keys %$resource_bundles) {
+        my $current_leader_node = $ss->{$leader_sid}->{node};
+        my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
+
+        my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+        my ($together, $separate) =
+            get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
+        apply_negative_resource_affinity($separate, $target_nodes);
+
+        delete $target_nodes->{$current_leader_node};
+
+        next if !%$target_nodes;
+
+        push @compact_migration_candidates,
+            {
+                leader => $leader_sid,
+                nodes => [sort keys %$target_nodes],
+                resources => $resource_bundles->{$leader_sid},
+            };
+    }
+
+    return \@compact_migration_candidates;
+}
+
+sub load_balance {
+    my ($self) = @_;
+
+    my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
+    my ($auto_rebalance_opts) = $crs->{auto_rebalance};
+
+    return if !$auto_rebalance_opts->{enable};
+    return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
+    return if $self->any_resource_motion_queued_or_running();
+
+    my ($threshold, $method, $hold_duration, $margin) =
+        $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
+
+    my $imbalance = $online_node_usage->calculate_node_imbalance();
+
+    # do not load balance unless imbalance threshold has been exceeded
+    # consecutively for $hold_duration calls to load_balance()
+    if ($imbalance < $threshold) {
+        $self->{sustained_imbalance_round} = 0;
+        return;
+    } else {
+        $self->{sustained_imbalance_round}++;
+        return if $self->{sustained_imbalance_round} < $hold_duration;
+        $self->{sustained_imbalance_round} = 0;
+    }
+
+    my $candidates = $self->get_resource_migration_candidates();
+
+    my $result;
+    if ($method eq 'bruteforce') {
+        $result = $online_node_usage->select_best_balancing_migration($candidates);
+    } elsif ($method eq 'topsis') {
+        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
+    }
+
+    # happens if $candidates is empty or $method isn't handled above
+    return if !$result;
+
+    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
+
+    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
+    return if $relative_change < $margin;
+
+    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
+
+    my (undef, $type, $id) = $haenv->parse_sid($sid);
+    my $task = $type eq 'vm' ? "migrate" : "relocate";
+    my $cmd = "$task $sid $target";
+
+    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
+    $haenv->log(
+        'info',
+        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
+    );
+
+    $self->queue_resource_motion($cmd, $task, $sid, $target);
+}
+
 sub cleanup {
     my ($self) = @_;
 
@@ -466,6 +624,21 @@ sub queue_resource_motion {
     }
 }
 
+sub any_resource_motion_queued_or_running {
+    my ($self) = @_;
+
+    my ($ss) = $self->@{qw(ss)};
+
+    for my $sid (keys %$ss) {
+        my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
+
+        return 1 if $state eq 'migrate' || $state eq 'relocate';
+        return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
+    }
+
+    return 0;
+}
+
 # read new crm commands and save them into crm master status
 sub update_crm_commands {
     my ($self) = @_;
@@ -902,6 +1075,10 @@ sub manage {
             return; # disarm active and progressing, skip normal service state machine
         }
         # disarm deferred - fall through but only process services in transient states
+    } else {
+        # load balance only if disarm is disabled as during a deferred disarm
+        # the HA Manager should not introduce any new migrations
+        $self->load_balance();
     }
 
     $self->{all_lrms_disarmed} = 0;
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 43feb041..659ab30a 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -60,6 +60,40 @@ sub remove_service_usage {
     die "implement in subclass";
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    die "implement in subclass";
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration_topsis {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
 # Returns a hash with $nodename => $score pairs. A lower $score is better.
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
index 24c85a41..76d0feaa 100644
--- a/src/PVE/HA/Usage/Dynamic.pm
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -104,6 +104,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate dynamic node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 8c7a614b..e67d5f5b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -111,6 +111,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate static node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (34 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  9:09   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
                   ` (5 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases document which resource bundles count as active and
stationary and ensure that get_active_stationary_resource_bundles(...)
does produce the correct active, stationary resource bundles.

This is especially important, because these resource bundles are used
for the load balancing candidate generation, which is passed to
score_best_balancing_migration_candidates($candidates, ...). The
PVE::HA::Usage::{Static,Dynamic} implementation validates these
candidates and fails with an user-visible error message.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 src/test/Makefile                 |   1 +
 src/test/test_resource_bundles.pl | 234 ++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+)
 create mode 100755 src/test/test_resource_bundles.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index 6da9e100..f72b755b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -6,6 +6,7 @@ test:
 	@echo "-- start regression tests --"
 	./test_failover1.pl
 	./test_rules_config.pl
+	./test_resource_bundles.pl
 	./ha-tester.pl
 	./test_fence_config.pl
 	@echo "-- end regression tests (success) --"
diff --git a/src/test/test_resource_bundles.pl b/src/test/test_resource_bundles.pl
new file mode 100755
index 00000000..d38dc516
--- /dev/null
+++ b/src/test/test_resource_bundles.pl
@@ -0,0 +1,234 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use Test::More;
+
+use PVE::HA::Manager;
+
+my $get_active_stationary_resource_bundle_tests = [
+    {
+        description => "trivial resource bundles",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {},
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101',
+            ],
+            'vm:102' => [
+                'vm:102',
+            ],
+        },
+    },
+    {
+        description => "simple resource bundle",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:102',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with first resource stopped",
+        services => {
+            'vm:101' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:102' => [
+                'vm:102', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with some stopped resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with moving resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'migrate',
+                node => 'node2',
+                target => 'node1',
+            },
+            'vm:103' => {
+                state => 'relocate',
+                node => 'node3',
+                target => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+    # might happen if the resource bundle is generated even before the HA Manager
+    # puts the HA resources in migrate/relocate to make them adhere to the HA rules
+    {
+        description => "resource bundle with resources on different nodes",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node2',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node3',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+];
+
+my $tests = [
+    @$get_active_stationary_resource_bundle_tests,
+];
+
+plan(tests => scalar($tests->@*));
+
+for my $case ($get_active_stationary_resource_bundle_tests->@*) {
+    my ($ss, $resource_affinity) = $case->@{qw(services resource_affinity)};
+
+    my $result = PVE::HA::Manager::get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    is_deeply($result, $case->{resource_bundles}, $case->{description});
+}
+
+done_testing();
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system test cases
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (35 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  9:33   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 38/40] test: add static " Daniel Kral
                   ` (4 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the automatic load
rebalancer using the dynamic usage stats.

As an overview:

- Case 0: rebalancing system is inactive for no configured HA resources
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through dynamic
          changes in their usage
- Case 4: rebalancing system doesn't trigger a migration if the node
          imbalance is exceeded once but isn't sustained for at least
          the set hold duration

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 .../test-crs-dynamic-auto-rebalance0/README   |  2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../test-crs-dynamic-auto-rebalance1/README   |  7 ++
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-dynamic-auto-rebalance2/README   |  4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-dynamic-auto-rebalance3/README   |  4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  | 16 ++++
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 80 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 .../test-crs-dynamic-auto-rebalance4/README   | 11 +++
 .../test-crs-dynamic-auto-rebalance4/cmdlist  | 13 +++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 45 files changed, 451 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance0/README b/src/test/test-crs-dynamic-auto-rebalance0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
new file mode 100644
index 00000000..6526c203
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-threshold": 0.7
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/log.expect b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/manager_status b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/service_config b/src/test/test-crs-dynamic-auto-rebalance0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/README b/src/test/test-crs-dynamic-auto-rebalance1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/service_config b/src/test/test-crs-dynamic-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/README b/src/test/test-crs-dynamic-auto-rebalance2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/service_config b/src/test/test-crs-dynamic-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/README b/src/test/test-crs-dynamic-auto-rebalance3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
new file mode 100644
index 00000000..a07fe721
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
@@ -0,0 +1,80 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    260    node1/crm: auto rebalance - migrate vm:103 to node1 (expected target imbalance: 0.4)
+info    260    node1/crm: got crm command: migrate vm:103 node1
+info    260    node1/crm: migrate service 'vm:103' to node 'node1'
+info    260    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    263    node2/lrm: service vm:103 - start migrate to node 'node1'
+info    263    node2/lrm: service vm:103 - end migrate to node 'node1'
+info    280    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    281    node1/lrm: starting service vm:103
+info    281    node1/lrm: service status vm:103 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/service_config b/src/test/test-crs-dynamic-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/README b/src/test/test-crs-dynamic-auto-rebalance4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..14059a3e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/service_config b/src/test/test-crs-dynamic-auto-rebalance4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 38/40] test: add static automatic rebalancing system test cases
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (36 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  9:44   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
                   ` (3 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases are derivatives of the dynamic automatic rebalancing
system test cases 1 to 3, which ensure that the same basic functionality
is provided with the automatic rebalancing system with static usage
information.

The other dynamic usage test cases are not included here, because these
are invariant to the provided usage information and only test further
edge cases.

As an overview:

- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through changes
          in their static usage

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 .../test-crs-static-auto-rebalance1/README    |  7 ++
 .../test-crs-static-auto-rebalance1/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-static-auto-rebalance2/README    |  4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-static-auto-rebalance3/README    |  3 +
 .../test-crs-static-auto-rebalance3/cmdlist   | 15 ++++
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 79 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 24 files changed, 273 insertions(+)
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats

diff --git a/src/test/test-crs-static-auto-rebalance1/README b/src/test/test-crs-static-auto-rebalance1/README
new file mode 100644
index 00000000..8f97ac55
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with static usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-static-auto-rebalance1/cmdlist b/src/test/test-crs-static-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance1/datacenter.cfg b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance1/hardware_status b/src/test/test-crs-static-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/log.expect b/src/test/test-crs-static-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d2c27bec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance1/manager_status b/src/test/test-crs-static-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance1/service_config b/src/test/test-crs-static-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/static_service_stats b/src/test/test-crs-static-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/README b/src/test/test-crs-static-auto-rebalance2/README
new file mode 100644
index 00000000..1d1b9d6e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-static-auto-rebalance2/cmdlist b/src/test/test-crs-static-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance2/datacenter.cfg b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance2/hardware_status b/src/test/test-crs-static-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/log.expect b/src/test/test-crs-static-auto-rebalance2/log.expect
new file mode 100644
index 00000000..3df96d83
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance2/manager_status b/src/test/test-crs-static-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-auto-rebalance2/service_config b/src/test/test-crs-static-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/static_service_stats b/src/test/test-crs-static-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/README b/src/test/test-crs-static-auto-rebalance3/README
new file mode 100644
index 00000000..2f57dac2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/README
@@ -0,0 +1,3 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running HA resources, where the static usage stats of some
+HA resources change over time, to reach minimum cluster node imbalance.
diff --git a/src/test/test-crs-static-auto-rebalance3/cmdlist b/src/test/test-crs-static-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..f18798b0
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/cmdlist
@@ -0,0 +1,15 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:106 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:107 set-static-stats maxcpu 8.0 maxmem 8192"
+    ],
+    [
+        "service vm:101 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:102 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:103 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:104 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:105 set-static-stats maxcpu 1.0 maxmem 1024"
+    ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance3/datacenter.cfg b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance3/hardware_status b/src/test/test-crs-static-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/log.expect b/src/test/test-crs-static-auto-rebalance3/log.expect
new file mode 100644
index 00000000..ddb4e5ec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/log.expect
@@ -0,0 +1,79 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:106 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:107 set-static-stats maxcpu 8.0 maxmem 8192
+info    160    node1/crm: auto rebalance - migrate vm:105 to node1 (expected target imbalance: 0.47)
+info    160    node1/crm: got crm command: migrate vm:105 node1
+info    160    node1/crm: migrate service 'vm:105' to node 'node1'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node1'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node1'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node1)
+info    181    node1/lrm: starting service vm:105
+info    181    node1/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:102 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:103 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:104 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:105 set-static-stats maxcpu 1.0 maxmem 1024
+info    260    node1/crm: auto rebalance - migrate vm:106 to node2 (expected target imbalance: 0.42)
+info    260    node1/crm: got crm command: migrate vm:106 node2
+info    260    node1/crm: migrate service 'vm:106' to node 'node2'
+info    260    node1/crm: service 'vm:106': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    265    node3/lrm: service vm:106 - start migrate to node 'node2'
+info    265    node3/lrm: service vm:106 - end migrate to node 'node2'
+info    280    node1/crm: service 'vm:106': state changed from 'migrate' to 'started'  (node = node2)
+info    283    node2/lrm: starting service vm:106
+info    283    node2/lrm: service status vm:106 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance3/manager_status b/src/test/test-crs-static-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance3/service_config b/src/test/test-crs-static-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/static_service_stats b/src/test/test-crs-static-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..560a6fe8
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:105": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:106": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:107": { "maxcpu": 2.0, "maxmem": 2147483648 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (37 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 38/40] test: add static " Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31  9:48   ` Dominik Rusovac
  2026-03-30 14:30 ` [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
                   ` (2 subsequent siblings)
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases are clones of the dynamic automatic rebalancing system
test cases 0 through 4, which ensure that the same basic functionality
is provided with the automatic rebalancing system using the TOPSIS
method.

The expected outputs are exactly the same, but for test case 3, which
changes the second migration from

    vm:103 to node1 with an expected target imbalance of 0.40

to

    vm:103 to node3 with an expected target imbalance of 0.43.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 .../README                                    |  2 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../README                                    |  7 ++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../README                                    |  4 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    |  4 +
 .../cmdlist                                   | 16 ++++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 80 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 .../README                                    | 11 +++
 .../cmdlist                                   | 13 +++
 .../datacenter.cfg                            |  9 +++
 .../dynamic_service_stats                     |  9 +++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 +++
 .../static_service_stats                      |  9 +++
 45 files changed, 455 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/README b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/README b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/README b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/README b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
new file mode 100644
index 00000000..4aaddd39
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
@@ -0,0 +1,80 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    260    node1/crm: auto rebalance - migrate vm:103 to node3 (expected target imbalance: 0.43)
+info    260    node1/crm: got crm command: migrate vm:103 node3
+info    260    node1/crm: migrate service 'vm:103' to node 'node3'
+info    260    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    263    node2/lrm: service vm:103 - start migrate to node 'node3'
+info    263    node2/lrm: service vm:103 - end migrate to node 'node3'
+info    280    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    285    node3/lrm: starting service vm:103
+info    285    node3/lrm: service status vm:103 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/README b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
new file mode 100644
index 00000000..0fb3fdc3
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis",
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+    "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+    "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (38 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-03-30 14:30 ` Daniel Kral
  2026-03-31 10:06   ` Dominik Rusovac
  2026-03-31 20:44 ` partially-applied: [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Thomas Lamprecht
  2026-04-02 12:55 ` superseded: " Daniel Kral
  41 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-30 14:30 UTC (permalink / raw)
  To: pve-devel

These test cases document and verify some behaviors of the automatic
rebalancing system in combination with HA affinity rules.

All of these test cases use only the dynamic usage information and
bruteforce method as the waiting on ongoing migrations and candidate
generation are invariant to those parameters.

As an overview:

- Case 1: rebalancing system acknowledges node affinity rules
- Case 2: rebalancing system considers HA resources in strict positive
          resource affinity rules as a single unit (a resource bundle)
          and will not split them apart
- Case 3: rebalancing system will wait on the migration of a not-yet
          enforced strict positive resource affinity rule, i.e., the
          HA resources still need to migrate to their common node
- Case 4: rebalancing system will acknowledge strict negative resource
          affinity rules, but will still try to minimize the node
          imbalance as much as possible

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v2 -> v3:
- none

 .../README                                    |  7 +++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 49 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  5 ++
 .../static_service_stats                      |  5 ++
 .../README                                    | 12 ++++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 53 +++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../static_service_stats                      |  4 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            | 31 ++++++++++
 .../rules_config                              |  3 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 +++
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 40 files changed, 452 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/README b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
new file mode 100644
index 00000000..8504755f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information will not
+auto rebalance running HA resources, which cause a node imbalance exceeding the
+threshold, because their HA node affinity rules require them to strictly be
+kept on specific nodes.
+
+As a sanity check, the added HA resource, which is not part of the node
+affinity rule, is rebalanced to another node to lower the imbalance.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..6ee04948
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:104 add node1 started 1",
+        "service vm:104 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:104 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..02133ab0
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d0b2aee2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
@@ -0,0 +1,49 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:104 add node1 started 1
+info    120      cmdlist: execute service vm:104 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:104 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:104' on node 'node1'
+info    120    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.98)
+info    140    node1/crm: got crm command: migrate vm:104 node2
+info    140    node1/crm: migrate service 'vm:104' to node 'node2'
+info    140    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:104 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:104 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:104
+info    163    node2/lrm: service status vm:104 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
new file mode 100644
index 00000000..00f615e9
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-stays-on-node1
+	nodes node1
+	resources vm:101,vm:102,vm:103
+	strict 1
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
new file mode 100644
index 00000000..57e3579d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..b11cc5eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/README b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
new file mode 100644
index 00000000..be072f6d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
@@ -0,0 +1,12 @@
+Test that the auto rebalance system with dynamic usage information will
+consider running HA resources in strict positive resource affinity rules as
+bundles, which can only be moved to other nodes as a single unit.
+
+Therefore, even though the two initial HA resources would be split apart,
+because these cause a node imbalance in the cluster, the auto rebalance system
+does not issue a rebalancing migration, because they must stay together.
+
+As a sanity check, adding another HA resource, which is not part of the strict
+positive resource affinity rule, will cause a rebalancing migration: in this
+case the resource bundle itself, because the leading node 'vm:101' is
+alphabetically first.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..61373367
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:103 add node1 started 1",
+        "service vm:103 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:103 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..4f81dfe2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
new file mode 100644
index 00000000..48501321
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
@@ -0,0 +1,53 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:103 add node1 started 1
+info    120      cmdlist: execute service vm:103 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:103 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:103' on node 'node1'
+info    120    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.86)
+info    140    node1/crm: got crm command: migrate vm:101 node2
+info    140    node1/crm: crm command 'migrate vm:101 node2' - migrate service 'vm:102' to node 'node2' (service 'vm:102' in positive affinity with service 'vm:101')
+info    140    node1/crm: migrate service 'vm:101' to node 'node2'
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    140    node1/crm: migrate service 'vm:102' to node 'node2'
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:101 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:101 - end migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:101
+info    163    node2/lrm: service status vm:101 started
+info    163    node2/lrm: starting service vm:102
+info    163    node2/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
new file mode 100644
index 00000000..e1948a00
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:102
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
new file mode 100644
index 00000000..880e0a59
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..455ae043
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/README b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
new file mode 100644
index 00000000..4b4d4855
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will wait on
+a resource motion being finished, because a strict positive resource affinity
+rule is not correctly enforced yet.
+
+This test case manipulates the manager status in such a way, so that the HA
+Manager will assume that the not-yet-migrated HA resource in the strict
+positive resource affinity rule is still migrating as currently the integration
+tests do not support prolonged migrations.
+
+Furthermore, auto rebalancing migrations are forced to be issued as soon as
+possible with the hold duration being set to 0. This ensures that if the auto
+rebalance system would not wait on the ongoing migration, the auto rebalancing
+migration would be done right away in the same round as the HA resources being
+acknowledged as running.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..181ea848
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 0
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..d35a2c8f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 },
+    "vm:104": { "cpu": 4.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
new file mode 100644
index 00000000..1242f827
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: service vm:101 - start migrate to node 'node1'
+info     23    node2/lrm: service vm:101 - end migrate to node 'node1'
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     60    node1/crm: auto rebalance - migrate vm:102 to node2 (expected target imbalance: 0.72)
+info     60    node1/crm: got crm command: migrate vm:102 node2
+info     60    node1/crm: migrate service 'vm:102' to node 'node2'
+info     60    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     61    node1/lrm: service vm:102 - start migrate to node 'node2'
+info     61    node1/lrm: service vm:102 - end migrate to node 'node2'
+info     80    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info     83    node2/lrm: starting service vm:102
+info     83    node2/lrm: service status vm:102 started
+info    100    node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.27)
+info    100    node1/crm: got crm command: migrate vm:101 node3
+info    100    node1/crm: crm command 'migrate vm:101 node3' - migrate service 'vm:103' to node 'node3' (service 'vm:103' in positive affinity with service 'vm:101')
+info    100    node1/crm: migrate service 'vm:101' to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    100    node1/crm: migrate service 'vm:103' to node 'node3'
+info    100    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    101    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - end migrate to node 'node3'
+info    105    node3/lrm: got lock 'ha_agent_node3_lock'
+info    105    node3/lrm: status change wait_for_agent_lock => active
+info    120    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    120    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    125    node3/lrm: starting service vm:101
+info    125    node3/lrm: service status vm:101 started
+info    125    node3/lrm: starting service vm:103
+info    125    node3/lrm: service status vm:103 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
new file mode 100644
index 00000000..cf90037c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
@@ -0,0 +1,31 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"online",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node2",
+	    "state": "migrate",
+	    "target": "node1",
+	    "uid": "RoPGTlvNYq/oZFokv9fgWw"
+	},
+        "vm:102": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "fR3i18EHk6DhF8Zd2jddNX"
+        },
+	"vm:103": {
+	    "node": "node1",
+	    "state": "started",
+	    "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+	},
+        "vm:104": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "23hk23EHk6DhF8Zd0218DD"
+        }
+    }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
new file mode 100644
index 00000000..2c3f3171
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:103
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
new file mode 100644
index 00000000..3dadaabc
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/README b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
new file mode 100644
index 00000000..e304cc22
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will not
+rebalance a HA resource on the same node as another HA resource, which are in a
+strict negative resource affinity rule.
+
+There is a high node imbalance since vm:101 and vm:102 on node1 cause a higher
+usage than node2 and node3 have. Even though it would be ideal to move one of
+these to node2, because it has a very low usage, these cannot be moved there as
+both vm:101 and vm:102 are in a strict negative resource affinity rule with a
+HA resource on node2 respectively.
+
+To minimize the imbalance in the cluster, one of the HA resources from node1 is
+migrated to node3 first, and afterwards the HA resource on node3, which is not
+in a strict negative resource affinity rule with a HA resource on node2, will
+be migrated to node2.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..083f338b
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 4294967296 },
+    "vm:102": { "cpu": 2.4, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.0, "mem": 0 },
+    "vm:104": { "cpu": 1.0, "mem": 1073741824 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
new file mode 100644
index 00000000..58f1b481
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:104
+info     25    node3/lrm: service status vm:104 started
+info     80    node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.72)
+info     80    node1/crm: got crm command: migrate vm:101 node3
+info     80    node1/crm: migrate service 'vm:101' to node 'node3'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node3'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    105    node3/lrm: starting service vm:101
+info    105    node3/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.33)
+info    160    node1/crm: got crm command: migrate vm:104 node2
+info    160    node1/crm: migrate service 'vm:104' to node 'node2'
+info    160    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:104 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:104 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:104
+info    183    node2/lrm: service status vm:104 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
new file mode 100644
index 00000000..eef5460f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
@@ -0,0 +1,7 @@
+resource-affinity: vms-stay-apart1
+	resources vm:101,vm:103
+	affinity negative
+
+resource-affinity: vms-stay-apart2
+	resources vm:102,vm:103
+	affinity negative
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
new file mode 100644
index 00000000..16bffacf
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
  2026-03-30 14:30 ` [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-31  6:01   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  6:01 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> This makes moving the function out into its own module easier to follow,
> which in turn is needed to generalize score_nodes_to_start_service(...)
> for other usage stats in the following patches.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
> ---
> changes v2 -> v3:
> - none

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate
  2026-03-30 14:30 ` [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
@ 2026-03-31  6:01   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  6:01 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> This is done so score_nodes_to_start_service(...) can be generalized in
> the following patches, so other usage stat structs can reuse the same
> scoring method.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
> ---
> changes v2 -> v3:
> - none

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate
  2026-03-30 14:30 ` [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
@ 2026-03-31  6:02   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  6:02 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The term `resource` is more appropriate with respect to the crate name
> and also the preferred name for the current main application in the HA
> Manager.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
> ---
> changes v2 -> v3:
> - none

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation
  2026-03-30 14:30 ` [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-31  6:11   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  6:11 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

changes lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The existing score_nodes_to_start_resource(...) function is dependent on
> the StaticNodeUsage and StaticServiceUsage structs.
>
> To use this function for other usage stats structs as well, declare
> generic NodeStats and ResourceStats structs, that the users can convert
> into. These are used to make score_nodes_to_start_resource(...) and its
> documentation generic.
>
> The pve_static::score_nodes_to_start_service(...) is marked as
> deprecated accordingly. The usage-related structs are marked as
> deprecated as well as the specific usage implementations - including
> their serialization and deserialization - should be handled by the
> caller now.
>
> This is best viewed with the git option --ignore-all-space.
>
> No functional changes intended.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - add Clone and Debug derives to Scheduler struct
> - change second unlimited_cpu_resource_stats variable to
>   combined_resource_stats; was a Copy-Paste error
>

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation
  2026-03-30 14:30 ` [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-31  7:26   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  7:26 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

one tiny nit inline

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - inline bail! formatting variables
> - s/to_string/to_owned/ where reasonable
> - make Node::resources_iter(&self) return &str Iterator impl
> - drop add_resource_to_nodes() and remove_resource_from_nodes()
> - drop ResourcePlacement::nodenames() and Resource::nodenames()
> - drop Resource::moving_to()
> - fix behavior of add_resource_usage_to_node() for already added
>   resources: if the next nodename is non-existing, the resource would
>   still be put into moving but then not add the resource to the nodes;
>   this is fixed now by improving the handling
> - inline behavior of add_resource() to be more consise how both
>   placement strategies are handled
> - no change in Resource::remove_node() documentation as I did not find a
>   better description in the meantime, but as it's internal it can be
>   improved later on as well
>
> test changes v2 -> v3:
> - use assertions whether nodes were added correctly in test cases
> - use assertions whether resource were added correctly in test cases
> - additionally assert whether resource cannot be added to non-existing
>   node with add_resource_usage_to_node() and does not alter state of the
>   Resource for that resource in the mean time as it was in v2
> - use assert!() instead of bail!() in test cases as much as appropriate
>

[snip]

> +    /// Add `stats` from resource with identifier `sid` to node `nodename` in cluster usage.
> +    ///
> +    /// For the first call, the resource is assumed to be started and stationary on the given node.
> +    /// If there was no intermediate call to remove the resource, the second call will assume that
> +    /// the given node is the target node and the resource is being moved there. The second call
> +    /// will ignore the value of `stats`.
> +    #[deprecated = "only for backwards compatibility, use add_resource(...) instead"]
> +    pub fn add_resource_usage_to_node(
> +        &mut self,
> +        nodename: &str,
> +        sid: &str,
> +        stats: ResourceStats,
> +    ) -> Result<(), Error> {
> +        if let Some(resource) = self.resources.remove(sid) {
> +            match resource.placement() {
> +                ResourcePlacement::Stationary { current_node } => {
> +                    let placement = ResourcePlacement::Moving {
> +                        current_node: current_node.to_owned(),
> +                        target_node: nodename.to_owned(),
> +                    };
> +                    let new_resource = Resource::new(resource.stats(), resource.state(), placement);
> +

nit: the logic in here is kinda tricky, adding an explanatory
comment would certainly help the understanding 

> +                    if let Err(err) = self.add_resource(sid.to_owned(), new_resource) {
> +                        self.add_resource(sid.to_owned(), resource)?;
> +
> +                        bail!(err);
> +                    }
> +
> +                    Ok(())
> +                }
> +                ResourcePlacement::Moving { target_node, .. } => {
> +                    bail!("resource '{sid}' is already moving to target node '{target_node}'")
> +                }
> +            }
> +        } else {
> +            let placement = ResourcePlacement::Stationary {
> +                current_node: nodename.to_owned(),
> +            };
> +            let resource = Resource::new(stats, ResourceState::Started, placement);
> +
> +            self.add_resource(sid.to_owned(), resource)
> +        }
> +    }

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-30 14:30 ` [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-31  7:33   ` Dominik Rusovac
  2026-03-31 12:42   ` Michael Köppl
  1 sibling, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  7:33 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> Assuming that a resource will hold the same dynamic resource usage on a
> new node as on the previous node, score possible migrations, where:
>
> - the cluster node imbalance is minimal (bruteforce), or
> - the shifted root mean square and maximum resource usages of the cpu
>   and memory is minimal across the cluster nodes (TOPSIS).
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - fix wording in ScoredMigration::new() documentation
> - use f64::powi instead of f64::powf in ScoredMigration::new()
> - adapt wording in MigrationCandidate `stats` member documentation
> - only compare order of return value of
>   score_best_balancing_migration_candidates{,_topsis}() in test cases
>   instead of equal between the imbalance scores
> - introduce rank_best_balancing_migration_candidates{,_topsis}() for the
>   test cases for reduced code duplication
> - use assert! instead of bail! wherever appropriate
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation
  2026-03-30 14:30 ` [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-31  7:40   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  7:40 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The proxmox_resource_scheduling crate provides a generic usage
> implementation, which is backwards compatible with the pve_static
> bindings. This reduces the static resource scheduling bindings to a
> slightly thinner wrapper.
>
> This also exposes the new `add_resource(...)` binding, which allows
> callers to add services with additional state other than the usage
> stats. It is exposed as `add_service(...)` to be consistent with the
> naming of the rest of the existing methods.
>
> Where it is sensible for the bindings, the documentation is extended
> with a link to the documentation of the underlying methods.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - require callers to handle the `current_node` not set invariant
>   themselves, as this is pve-ha-manager-specific behavior and simplifies
>   the logic a bit
> - s/FromInto/From/ for PveResource<T> impl
> - use kebab-case for (de)serialization of `PveResource<T>`
> - make node_stats closure variable mutable instead of shadowing it in
>   the closure body again in StartedResourceAggregator::aggregate()
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options
  2026-03-30 14:30 ` [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-31  7:52   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  7:52 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - clarify unit for ha-auto-rebalance-hold-duration as suggested by
>   @Jillian Morgan and @Dominik
> - "The threshold for cluster node imbalance" instead of "The threshold
>   for node load", where the latter is not really representative
> - further improve wording a bit, so another read would be very
>   appreciated!
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats
  2026-03-30 14:30 ` [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats Daniel Kral
@ 2026-03-31  7:58   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  7:58 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The running flag is needed to discriminate starting and started
> resources from each other, which is a required parameter for using the
> new add_service(...) method for the resource scheduling bindings.
>
> The HA Manager tracks whether HA resources are in 'started' state and
> whether the LRM acknowledged that these are running. For non-HA
> resources, the rrd_dump data contains a running flag for VM and CT
> guests.
>
> See the next patch for the usage implementations, which passes the
> running flag to the add_service(...) method, for more information about
> the details.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - change the patch message and summary such that is clear that this
>   patch is only really relevant to make non-HA cluster resources also
>   have the running flag
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes
  2026-03-30 14:30 ` [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-03-31  8:12   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  8:12 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The pve_static (and upcoming pve_dynamic) bindings expose the new
> add_resource(...) method, which allow adding resources in a single call
> with the additional running flag.
>
> The running flag is needed to discriminate starting and started HA
> resources from each other, which is needed to correctly account for HA
> resources for the dynamic load usage implementation in the next patch.
>
> This is because for the dynamic load usage, any HA resource, which is
> scheduled to start by the HA Manager in the same round, will not be
> accounted for in the next call to score_nodes_to_start_resource(...).
> This is not a problem for the static load usage, because there the
> current node usages are derived from the started resources on every
> call already.
>
> Passing only the HA resources' 'state' property is not enough since the
> HA Manager will move any HA resource from the 'request_start' (or
> through other transient states such as 'request_start_balance' and a
> successful 'migrate'/'relocate') into the 'started' state.
>
> This 'started' state is then picked up by the HA resource's LRM, which
> will actually start the HA resource and if successful respond with a
> 'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
> adding the running flag to the HA resource's state.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - fix setting $running flag only if state 'started' and running is set
>   or for any non-started state where current_node is set
> - Additionally handle the case where only $target_node is set in
>   add_service(), which can only happen in specific cases; I have some
>   patches which inline this behavior in get_used_service_nodes() (should
>   be named something else later) to make this behavior more consise, but
>   that should be handled separately
> - change the $service property names to kebab-case
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler
  2026-03-30 14:30 ` [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler Daniel Kral
@ 2026-03-31  8:15   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  8:15 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> The dynamic usage scheduler allows the HA Manager to make scheduling
> decisions based on the current usage of the nodes and cluster resources
> in addition to the maximum usage stats as reported by the PVE::HA::Env
> implementation.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - add !defined(...) guards in PVE::HA::Usage::Dynamic::add_node(...) as
>   suggested by @Dominik
> - adapt comment in add_service() as suggested by @Dominik
> - did not change error messages as suggested by @Dominik, because these
>   are consistent with the one for Static; should be done in a separate
>   patch (series)
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases
  2026-03-30 14:30 ` [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-03-31  8:20   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  8:20 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases document the basic behavior of the scheduler using the
> dynamic usage information of the HA resources with rebalance-on-start
> being cleared and set respectively.
>
> As the mechanisms for the scheduler with static and dynamic usage
> information are mostly the same, these test cases verify only the
> essential parts, which are:
>
> - dynamic usage information is used correctly (for both test cases), and
> - repeatedly scheduling resources with score_nodes_to_start_service(...)
>   correctly simulates that the previously scheduled HA resources are
>   already started
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - adapt the test-crs-dynamic-rebalance1 test case to correctly handle
>   the change in score_nodes_to_start_resource() as done by changing how
>   $running is set in add_service_usage() a few patches prior
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
@ 2026-03-31  9:07   ` Dominik Rusovac
  2026-03-31  9:07   ` Michael Köppl
  2026-03-31 13:50   ` Daniel Kral
  2 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:07 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

one nit inline

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> If the automatic load balancing system is enabled, it checks whether the
> cluster node imbalance exceeds some user-defined threshold for some HA
> Manager rounds ("hold duration"). If it does exceed on consecutive HA
> Manager rounds, it will choose the best resource motion to improve the
> cluster node imbalance and queue it if it significantly improves it by
> some user-defined imbalance improvement ("margin").
>
> This patch introduces resource bundles, which ensure that HA resources
> in strict positive resource affinity rules are considered as a whole
> "bundle" instead of individual HA resources.
>
> Specifically, active and stationary resource bundles are resource
> bundles, that have at least one resource running and all resources
> located on the same node. This distinction is needed as newly created
> strict positive resource affinity rules may still require some resource
> motions to enforce the rule.
>
> Additionally, the migration candidate generation prunes any target
> nodes, which do not adhere to the HA rules of these resource bundles
> before scoring these migration candidates.

nice work! also very nice idea to introduce resource bundles

>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

[snip]

> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 2576c762..0f8a03a6 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -59,10 +59,17 @@ sub new {
>  
>      my $self = bless {
>          haenv => $haenv,
> -        crs => {},
> +        crs => {
> +            auto_rebalance => {},
> +        },
>          last_rules_digest => '',
>          last_groups_digest => '',
>          last_services_digest => '',
> +        # used to track how many HA rounds the imbalance threshold has been exceeded
> +        #
> +        # this is not persisted for a CRM failover as in the mean time
> +        # the usage statistics might have change quite a bit already

nit:

           # the usage statistics might have change[d] quite a bit already

> +        sustained_imbalance_round => 0,
>          group_migration_round => 3, # wait a little bit
>      }, $class;
>  
> @@ -97,6 +104,13 @@ sub update_crs_scheduler_mode {
>      my $crs_cfg = $dc_cfg->{crs};
>  
>      $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
> +    $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
> +    $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.7;
> +    $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
> +        // 'bruteforce';
> +    $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
> +        // 3;
> +    $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
>  
>      my $old_mode = $self->{crs}->{scheduler};
>      my $new_mode = $crs_cfg->{ha} || 'basic';
> @@ -114,6 +128,150 @@ sub update_crs_scheduler_mode {
>      return;
>  }
>  
> +# Returns a hash of lists, which contain the running, non-moving HA resource
> +# bundles, which are on the same node, implied by the strict positive resource
> +# affinity rules.
> +#
> +# Each resource bundle has a leader, which is the alphabetically first running
> +# HA resource in the resource bundle and also the key of each resource bundle
> +# in the returned hash.
> +sub get_active_stationary_resource_bundles {
> +    my ($ss, $resource_affinity) = @_;
> +
> +    my $resource_bundles = {};
> +OUTER: for my $sid (sort keys %$ss) {
> +        # do not consider non-started resource as 'active' leading resource
> +        next if $ss->{$sid}->{state} ne 'started';
> +
> +        my @resources = ($sid);
> +        my $nodes = { $ss->{$sid}->{node} => 1 };
> +
> +        my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
> +        if (%$dependent_resources) {
> +            for my $csid (keys %$dependent_resources) {
> +                next if !defined($ss->{$csid});
> +                my ($state, $node) = $ss->{$csid}->@{qw(state node)};
> +
> +                # do not consider stationary bundle if a dependent resource moves
> +                next OUTER if $state eq 'migrate' || $state eq 'relocate';
> +                # do not add non-started resource to active bundle
> +                next if $state ne 'started';
> +
> +                $nodes->{$node} = 1;
> +
> +                push @resources, $csid;
> +            }
> +
> +            @resources = sort @resources;
> +        }
> +
> +        # skip resource bundles, which are not on the same node yet
> +        next if keys %$nodes > 1;
> +
> +        my $leader_sid = $resources[0];
> +
> +        $resource_bundles->{$leader_sid} = \@resources;
> +    }
> +
> +    return $resource_bundles;
> +}
> +
> +# Returns a hash of hashes, where each item contains the resource bundle's
> +# leader, the list of HA resources in the resource bundle, and the list of
> +# possible nodes to migrate to.
> +sub get_resource_migration_candidates {
> +    my ($self) = @_;
> +
> +    my ($ss, $compiled_rules, $online_node_usage) =
> +        $self->@{qw(ss compiled_rules online_node_usage)};
> +    my ($node_affinity, $resource_affinity) =
> +        $compiled_rules->@{qw(node-affinity resource-affinity)};
> +
> +    my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
> +
> +    my @compact_migration_candidates = ();
> +    for my $leader_sid (sort keys %$resource_bundles) {
> +        my $current_leader_node = $ss->{$leader_sid}->{node};
> +        my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
> +
> +        my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
> +        my ($together, $separate) =
> +            get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
> +        apply_negative_resource_affinity($separate, $target_nodes);
> +
> +        delete $target_nodes->{$current_leader_node};
> +
> +        next if !%$target_nodes;
> +
> +        push @compact_migration_candidates,
> +            {
> +                leader => $leader_sid,
> +                nodes => [sort keys %$target_nodes],
> +                resources => $resource_bundles->{$leader_sid},
> +            };
> +    }
> +
> +    return \@compact_migration_candidates;
> +}
> +
> +sub load_balance {
> +    my ($self) = @_;
> +
> +    my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
> +    my ($auto_rebalance_opts) = $crs->{auto_rebalance};
> +
> +    return if !$auto_rebalance_opts->{enable};
> +    return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
> +    return if $self->any_resource_motion_queued_or_running();
> +
> +    my ($threshold, $method, $hold_duration, $margin) =
> +        $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
> +
> +    my $imbalance = $online_node_usage->calculate_node_imbalance();
> +
> +    # do not load balance unless imbalance threshold has been exceeded
> +    # consecutively for $hold_duration calls to load_balance()
> +    if ($imbalance < $threshold) {
> +        $self->{sustained_imbalance_round} = 0;
> +        return;
> +    } else {
> +        $self->{sustained_imbalance_round}++;
> +        return if $self->{sustained_imbalance_round} < $hold_duration;
> +        $self->{sustained_imbalance_round} = 0;
> +    }
> +
> +    my $candidates = $self->get_resource_migration_candidates();
> +
> +    my $result;
> +    if ($method eq 'bruteforce') {
> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
> +    } elsif ($method eq 'topsis') {
> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
> +    }
> +
> +    # happens if $candidates is empty or $method isn't handled above
> +    return if !$result;
> +
> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
> +
> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
> +    return if $relative_change < $margin;
> +
> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
> +
> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
> +    my $cmd = "$task $sid $target";
> +

nit: tbh I racked my brain a bit to get how this rounding technique
worked; so pls at least add a comment to say that you're rounding or
use printf

> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
> +    $haenv->log(
> +        'info',
> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
> +    );
> +
> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
> +}
> +
>  sub cleanup {
>      my ($self) = @_;
>  
> @@ -466,6 +624,21 @@ sub queue_resource_motion {
>      }
>  }
>  
> +sub any_resource_motion_queued_or_running {
> +    my ($self) = @_;
> +
> +    my ($ss) = $self->@{qw(ss)};
> +
> +    for my $sid (keys %$ss) {
> +        my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
> +
> +        return 1 if $state eq 'migrate' || $state eq 'relocate';
> +        return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
> +    }
> +
> +    return 0;
> +}
> +
>  # read new crm commands and save them into crm master status
>  sub update_crm_commands {
>      my ($self) = @_;
> @@ -902,6 +1075,10 @@ sub manage {
>              return; # disarm active and progressing, skip normal service state machine
>          }
>          # disarm deferred - fall through but only process services in transient states
> +    } else {
> +        # load balance only if disarm is disabled as during a deferred disarm
> +        # the HA Manager should not introduce any new migrations
> +        $self->load_balance();
>      }

very thoughtful, nice

>  

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
  2026-03-31  9:07   ` Dominik Rusovac
@ 2026-03-31  9:07   ` Michael Köppl
  2026-03-31  9:16     ` Dominik Rusovac
  2026-03-31  9:42     ` Daniel Kral
  2026-03-31 13:50   ` Daniel Kral
  2 siblings, 2 replies; 72+ messages in thread
From: Michael Köppl @ 2026-03-31  9:07 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

2 comments inline

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:

[snip]

> +    my $candidates = $self->get_resource_migration_candidates();
> +
> +    my $result;
> +    if ($method eq 'bruteforce') {
> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
> +    } elsif ($method eq 'topsis') {
> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
> +    }
> +
> +    # happens if $candidates is empty or $method isn't handled above
> +    return if !$result;
> +
> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
> +
> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;

Since you get $imbalance from a function that returns 0.0 for the case
that the cluster load is perfectly balanced (?), you could run into
division by 0 here, no?

> +    return if $relative_change < $margin;
> +
> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
> +
> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
> +    my $cmd = "$task $sid $target";
> +
> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
> +    $haenv->log(
> +        'info',
> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
> +    );
> +
> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
> +}
> +

[snip]

>      $self->{all_lrms_disarmed} = 0;
> diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
> index 43feb041..659ab30a 100644
> --- a/src/PVE/HA/Usage.pm
> +++ b/src/PVE/HA/Usage.pm
> @@ -60,6 +60,40 @@ sub remove_service_usage {
>      die "implement in subclass";
>  }
>  
> +sub calculate_node_imbalance {
> +    my ($self) = @_;
> +
> +    die "implement in subclass";
> +}
> +
> +sub score_best_balancing_migrations {
> +    my ($self, $migration_candidates, $limit) = @_;
> +
> +    die "implement in subclass";
> +}
> +
> +sub select_best_balancing_migration {
> +    my ($self, $migration_candidates) = @_;
> +
> +    my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
> +
> +    return $migrations->[0];

If an error occurs in the following call in
score_best_balancing_migrations

    my $migrations = eval {
        $self->{scheduler}
            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
    };

you'd return an undefined $migrations, which would result in a
dereference error here.

> +}
> +
> +sub score_best_balancing_migrations_topsis {
> +    my ($self, $migration_candidates, $limit) = @_;
> +
> +    die "implement in subclass";
> +}
> +
> +sub select_best_balancing_migration_topsis {
> +    my ($self, $migration_candidates) = @_;
> +
> +    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
> +
> +    return $migrations->[0];
> +}
> +
>  # Returns a hash with $nodename => $score pairs. A lower $score is better.
>  sub score_nodes_to_start_service {
>      my ($self, $sid) = @_;
> diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
> index 24c85a41..76d0feaa 100644
> --- a/src/PVE/HA/Usage/Dynamic.pm
> +++ b/src/PVE/HA/Usage/Dynamic.pm
> @@ -104,6 +104,39 @@ sub remove_service_usage {
>      $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
>  }
>  

[snip]




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases
  2026-03-30 14:30 ` [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases Daniel Kral
@ 2026-03-31  9:09   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:09 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases document which resource bundles count as active and
> stationary and ensure that get_active_stationary_resource_bundles(...)
> does produce the correct active, stationary resource bundles.
>
> This is especially important, because these resource bundles are used
> for the load balancing candidate generation, which is passed to
> score_best_balancing_migration_candidates($candidates, ...). The
> PVE::HA::Usage::{Static,Dynamic} implementation validates these
> candidates and fails with an user-visible error message.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:07   ` Michael Köppl
@ 2026-03-31  9:16     ` Dominik Rusovac
  2026-03-31  9:32       ` Daniel Kral
  2026-03-31  9:42     ` Daniel Kral
  1 sibling, 1 reply; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:16 UTC (permalink / raw)
  To: Michael Köppl, Daniel Kral, pve-devel

On Tue Mar 31, 2026 at 11:07 AM CEST, Michael Köppl wrote:
> 2 comments inline
>
> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
>
> [snip]
>
>> +    my $candidates = $self->get_resource_migration_candidates();
>> +
>> +    my $result;
>> +    if ($method eq 'bruteforce') {
>> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
>> +    } elsif ($method eq 'topsis') {
>> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
>> +    }
>> +
>> +    # happens if $candidates is empty or $method isn't handled above
>> +    return if !$result;
>> +
>> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
>> +
>> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
>
> Since you get $imbalance from a function that returns 0.0 for the case
> that the cluster load is perfectly balanced (?), you could run into
> division by 0 here, no?
>

technically this could happen, however an imbalance of 0.0 certainly
should not exceed a threshold (this is the case that "the cluster load
is perfectly balanced"); so the $relative_change ought to be never computed 

>> +    return if $relative_change < $margin;
>> +
>> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
>> +
>> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
>> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
>> +    my $cmd = "$task $sid $target";
>> +
>> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
>> +    $haenv->log(
>> +        'info',
>> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
>> +    );
>> +
>> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
>> +}
>> +

[snip]




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:16     ` Dominik Rusovac
@ 2026-03-31  9:32       ` Daniel Kral
  2026-03-31  9:39         ` Dominik Rusovac
  0 siblings, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-31  9:32 UTC (permalink / raw)
  To: Dominik Rusovac, Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 11:16 AM CEST, Dominik Rusovac wrote:
> On Tue Mar 31, 2026 at 11:07 AM CEST, Michael Köppl wrote:
>> 2 comments inline
>>
>> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
>>
>> [snip]
>>
>>> +    my $candidates = $self->get_resource_migration_candidates();
>>> +
>>> +    my $result;
>>> +    if ($method eq 'bruteforce') {
>>> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
>>> +    } elsif ($method eq 'topsis') {
>>> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
>>> +    }
>>> +
>>> +    # happens if $candidates is empty or $method isn't handled above
>>> +    return if !$result;
>>> +
>>> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
>>> +
>>> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
>>
>> Since you get $imbalance from a function that returns 0.0 for the case
>> that the cluster load is perfectly balanced (?), you could run into
>> division by 0 here, no?
>>
>
> technically this could happen, however an imbalance of 0.0 certainly
> should not exceed a threshold (this is the case that "the cluster load
> is perfectly balanced"); so the $relative_change ought to be never computed 
>

Good catch, thanks to you both!

Even though it's unpractical, users can still set the threshold to 0.0,
which could actually cause a division by zero here, because the
threshold is compared by a >= relation.



    # cat imbalance-zero.pl
    #!/usr/bin/perl
    
    use v5.36;
    
    my $imbalance = 0.0;
    my $threshold = 0.0;
    my $hold_duration = 3;
    my $sustained_imbalance_round = 0;
    
    sub test {
        if ($imbalance < $threshold) {
            $sustained_imbalance_round = 0;
            return;
        } else {
            $sustained_imbalance_round++;
            print "imbalance threshold exceeded\n";
            return if $sustained_imbalance_round < $hold_duration;
            print "sustained high imbalance\n";
            $sustained_imbalance_round = 0;
        }
    
        my $target_imbalance = 0.0;
        my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
    }
    
    test();
    test();
    test();
    # chmod +x imbalance-zero.pl
    # ./imbalance-zero.pl
    imbalance threshold exceeded
    imbalance threshold exceeded
    imbalance threshold exceeded
    sustained high imbalance
    Illegal division by zero at ./imbalance-zero.pl line 23.



The system is rather unstable in that regard anyway (same if $margin =
0.0), because it always tries to load balance every $hold_duration HA
rounds.

I'm not sure whether we should prevent this with adjusting the range for
both the threshold and margin to be at least larger than some minimum
value, so that the load balancing system won't become unstable.

>>> +    return if $relative_change < $margin;
>>> +
>>> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
>>> +
>>> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
>>> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
>>> +    my $cmd = "$task $sid $target";
>>> +
>>> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
>>> +    $haenv->log(
>>> +        'info',
>>> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
>>> +    );
>>> +
>>> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
>>> +}
>>> +
>
> [snip]





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system test cases
  2026-03-30 14:30 ` [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-03-31  9:33   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:33 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this for basic testing of rebalacing system

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases document the basic behavior of the automatic load
> rebalancer using the dynamic usage stats.
>
> As an overview:
>
> - Case 0: rebalancing system is inactive for no configured HA resources
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
>           for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance and converge if
>           the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance through dynamic
>           changes in their usage
> - Case 4: rebalancing system doesn't trigger a migration if the node
>           imbalance is exceeded once but isn't sustained for at least
>           the set hold duration
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:32       ` Daniel Kral
@ 2026-03-31  9:39         ` Dominik Rusovac
  2026-03-31 13:55           ` Daniel Kral
  0 siblings, 1 reply; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:39 UTC (permalink / raw)
  To: Daniel Kral, Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 11:32 AM CEST, Daniel Kral wrote:
> On Tue Mar 31, 2026 at 11:16 AM CEST, Dominik Rusovac wrote:
>> On Tue Mar 31, 2026 at 11:07 AM CEST, Michael Köppl wrote:
>>> 2 comments inline
>>>
>>> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
>>>
>>> [snip]
>>>
>>>> +    my $candidates = $self->get_resource_migration_candidates();
>>>> +
>>>> +    my $result;
>>>> +    if ($method eq 'bruteforce') {
>>>> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
>>>> +    } elsif ($method eq 'topsis') {
>>>> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
>>>> +    }
>>>> +
>>>> +    # happens if $candidates is empty or $method isn't handled above
>>>> +    return if !$result;
>>>> +
>>>> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
>>>> +
>>>> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
>>>
>>> Since you get $imbalance from a function that returns 0.0 for the case
>>> that the cluster load is perfectly balanced (?), you could run into
>>> division by 0 here, no?
>>>
>>
>> technically this could happen, however an imbalance of 0.0 certainly
>> should not exceed a threshold (this is the case that "the cluster load
>> is perfectly balanced"); so the $relative_change ought to be never computed 
>>
>
> Good catch, thanks to you both!
>
> Even though it's unpractical, users can still set the threshold to 0.0,
> which could actually cause a division by zero here, because the
> threshold is compared by a >= relation.
>
>
>
>     # cat imbalance-zero.pl
>     #!/usr/bin/perl
>     
>     use v5.36;
>     
>     my $imbalance = 0.0;
>     my $threshold = 0.0;
>     my $hold_duration = 3;
>     my $sustained_imbalance_round = 0;
>     
>     sub test {
>         if ($imbalance < $threshold) {
>             $sustained_imbalance_round = 0;
>             return;
>         } else {
>             $sustained_imbalance_round++;
>             print "imbalance threshold exceeded\n";
>             return if $sustained_imbalance_round < $hold_duration;
>             print "sustained high imbalance\n";
>             $sustained_imbalance_round = 0;
>         }
>     
>         my $target_imbalance = 0.0;
>         my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
>     }
>     
>     test();
>     test();
>     test();
>     # chmod +x imbalance-zero.pl
>     # ./imbalance-zero.pl
>     imbalance threshold exceeded
>     imbalance threshold exceeded
>     imbalance threshold exceeded
>     sustained high imbalance
>     Illegal division by zero at ./imbalance-zero.pl line 23.
>
>
>
> The system is rather unstable in that regard anyway (same if $margin =
> 0.0), because it always tries to load balance every $hold_duration HA
> rounds.
>
> I'm not sure whether we should prevent this with adjusting the range for
> both the threshold and margin to be at least larger than some minimum
> value, so that the load balancing system won't become unstable.
>

yeah, either this, or you add a guard to return early (before the
threshold guard) whenever imbalance is 0.0, I guess

>>>> +    return if $relative_change < $margin;
>>>> +
>>>> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
>>>> +
>>>> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
>>>> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
>>>> +    my $cmd = "$task $sid $target";
>>>> +
>>>> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
>>>> +    $haenv->log(
>>>> +        'info',
>>>> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
>>>> +    );
>>>> +
>>>> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
>>>> +}
>>>> +
>>
>> [snip]





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:07   ` Michael Köppl
  2026-03-31  9:16     ` Dominik Rusovac
@ 2026-03-31  9:42     ` Daniel Kral
  2026-03-31 11:01       ` Michael Köppl
  1 sibling, 1 reply; 72+ messages in thread
From: Daniel Kral @ 2026-03-31  9:42 UTC (permalink / raw)
  To: Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 11:07 AM CEST, Michael Köppl wrote:
> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
>> diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
>> index 43feb041..659ab30a 100644
>> --- a/src/PVE/HA/Usage.pm
>> +++ b/src/PVE/HA/Usage.pm
>> @@ -60,6 +60,40 @@ sub remove_service_usage {
>>      die "implement in subclass";
>>  }
>>  
>> +sub calculate_node_imbalance {
>> +    my ($self) = @_;
>> +
>> +    die "implement in subclass";
>> +}
>> +
>> +sub score_best_balancing_migrations {
>> +    my ($self, $migration_candidates, $limit) = @_;
>> +
>> +    die "implement in subclass";
>> +}
>> +
>> +sub select_best_balancing_migration {
>> +    my ($self, $migration_candidates) = @_;
>> +
>> +    my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
>> +
>> +    return $migrations->[0];
>
> If an error occurs in the following call in
> score_best_balancing_migrations
>
>     my $migrations = eval {
>         $self->{scheduler}
>             ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
>     };
>
> you'd return an undefined $migrations, which would result in a
> dereference error here.
>

Hm, I can't seem to reproduce it even if I just `return undef` in
score_best_balancing_migrations{,_topsis}() and before writing this line
I tested it with:

    # cat perl-array-empty.pl
    #!/usr/bin/perl
    
    use v5.36;
    use Data::Dumper;
    
    my $array1 = undef;
    my $array2 = [];
    my $array3 = [{}];
    my $array4 = [{something => 'a'}, {else => 'b'}];
    
    my $i = 1;
    for my $var (($array1, $array2, $array3, $array4)) {
        print "array$i value: " . Dumper($var);
        print "array$i first: " . Dumper($var->[0]);
    
        $i++;
    }
    # ./perl-array-empty.pl
    array1 value: $VAR1 = undef;
    array1 first: $VAR1 = undef;
    array2 value: $VAR1 = [];
    array2 first: $VAR1 = undef;
    array3 value: $VAR1 = [
      {}
    ];
    array3 first: $VAR1 = {};
    array4 value: $VAR1 = [
      {
        'something' => 'a'
      },
      {
        'else' => 'b'
      }
    ];
    array4 first: $VAR1 = {
      'something' => 'a'
    };

Or do you have another reproducer for this?

>> +}
>> +
>> +sub score_best_balancing_migrations_topsis {
>> +    my ($self, $migration_candidates, $limit) = @_;
>> +
>> +    die "implement in subclass";
>> +}
>> +
>> +sub select_best_balancing_migration_topsis {
>> +    my ($self, $migration_candidates) = @_;
>> +
>> +    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
>> +
>> +    return $migrations->[0];
>> +}
>> +
>>  # Returns a hash with $nodename => $score pairs. A lower $score is better.
>>  sub score_nodes_to_start_service {
>>      my ($self, $sid) = @_;
>> diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
>> index 24c85a41..76d0feaa 100644
>> --- a/src/PVE/HA/Usage/Dynamic.pm
>> +++ b/src/PVE/HA/Usage/Dynamic.pm
>> @@ -104,6 +104,39 @@ sub remove_service_usage {
>>      $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
>>  }
>>  
>
> [snip]





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 38/40] test: add static automatic rebalancing system test cases
  2026-03-30 14:30 ` [PATCH ha-manager v3 38/40] test: add static " Daniel Kral
@ 2026-03-31  9:44   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:44 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider these basic tests

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases are derivatives of the dynamic automatic rebalancing
> system test cases 1 to 3, which ensure that the same basic functionality
> is provided with the automatic rebalancing system with static usage
> information.
>
> The other dynamic usage test cases are not included here, because these
> are invariant to the provided usage information and only test further
> edge cases.
>
> As an overview:
>
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
>           for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance and converge if
>           the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance through changes
>           in their static usage
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method
  2026-03-30 14:30 ` [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-03-31  9:48   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31  9:48 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases are clones of the dynamic automatic rebalancing system
> test cases 0 through 4, which ensure that the same basic functionality
> is provided with the automatic rebalancing system using the TOPSIS
> method.
>
> The expected outputs are exactly the same, but for test case 3, which
> changes the second migration from
>
>     vm:103 to node1 with an expected target imbalance of 0.40
>
> to
>
>     vm:103 to node3 with an expected target imbalance of 0.43.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules
  2026-03-30 14:30 ` [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
@ 2026-03-31 10:06   ` Dominik Rusovac
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Rusovac @ 2026-03-31 10:06 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> These test cases document and verify some behaviors of the automatic
> rebalancing system in combination with HA affinity rules.
>
> All of these test cases use only the dynamic usage information and
> bruteforce method as the waiting on ongoing migrations and candidate
> generation are invariant to those parameters.
>
> As an overview:
>
> - Case 1: rebalancing system acknowledges node affinity rules
> - Case 2: rebalancing system considers HA resources in strict positive
>           resource affinity rules as a single unit (a resource bundle)
>           and will not split them apart
> - Case 3: rebalancing system will wait on the migration of a not-yet
>           enforced strict positive resource affinity rule, i.e., the
>           HA resources still need to migrate to their common node

nice test case

> - Case 4: rebalancing system will acknowledge strict negative resource
>           affinity rules, but will still try to minimize the node
>           imbalance as much as possible

also nice test case

>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v2 -> v3:
> - none
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:42     ` Daniel Kral
@ 2026-03-31 11:01       ` Michael Köppl
  0 siblings, 0 replies; 72+ messages in thread
From: Michael Köppl @ 2026-03-31 11:01 UTC (permalink / raw)
  To: Daniel Kral, Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 11:42 AM CEST, Daniel Kral wrote:

[snip]

>     # ./perl-array-empty.pl
>     array1 value: $VAR1 = undef;
>     array1 first: $VAR1 = undef;
>     array2 value: $VAR1 = [];
>     array2 first: $VAR1 = undef;
>     array3 value: $VAR1 = [
>       {}
>     ];
>     array3 first: $VAR1 = {};
>     array4 value: $VAR1 = [
>       {
>         'something' => 'a'
>       },
>       {
>         'else' => 'b'
>       }
>     ];
>     array4 first: $VAR1 = {
>       'something' => 'a'
>     };
>
> Or do you have another reproducer for this?
>

as discussed off-list, my intuition failed me here. I would've expected
this to not work, but of course autovivification makes it work...

>>> +}
>>> +
>>> +sub score_best_balancing_migrations_topsis {
>>> +    my ($self, $migration_candidates, $limit) = @_;
>>> +
>>> +    die "implement in subclass";
>>> +}
>>> +
>>> +sub select_best_balancing_migration_topsis {
>>> +    my ($self, $migration_candidates) = @_;
>>> +
>>> +    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
>>> +
>>> +    return $migrations->[0];
>>> +}
>>> +
>>>  # Returns a hash with $nodename => $score pairs. A lower $score is better.
>>>  sub score_nodes_to_start_service {
>>>      my ($self, $sid) = @_;
>>> diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
>>> index 24c85a41..76d0feaa 100644
>>> --- a/src/PVE/HA/Usage/Dynamic.pm
>>> +++ b/src/PVE/HA/Usage/Dynamic.pm
>>> @@ -104,6 +104,39 @@ sub remove_service_usage {
>>>      $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
>>>  }
>>>  
>>
>> [snip]





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-30 14:30 ` [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
  2026-03-31  7:33   ` Dominik Rusovac
@ 2026-03-31 12:42   ` Michael Köppl
  2026-03-31 13:32     ` Daniel Kral
  1 sibling, 1 reply; 72+ messages in thread
From: Michael Köppl @ 2026-03-31 12:42 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:

[snip]

>  
> +    /// Adds the resource stats to the node stats as if the resource is running on the node.
> +    pub fn add_running_resource(&mut self, resource_stats: &ResourceStats) {
> +        self.cpu += resource_stats.cpu;
> +        self.mem += resource_stats.mem;
> +    }
> +
> +    /// Removes the resource stats from the node stats as if the resource is not running on the node.
> +    pub fn remove_running_resource(&mut self, resource_stats: &ResourceStats) {
> +        self.cpu -= resource_stats.cpu;

>From what I can gather, due to how the stats are collected, it could
occur here that self.cpu < 0. I think it could make sense to do
something like

    self.cpu = f64::max(0.0, self.cpu - resource_stats.cpu);

here to avoid it affecting the node imbalance calculation.

> +        self.mem = self.mem.saturating_sub(resource_stats.mem);
> +    }
> +
>      /// Returns the current cpu usage as a percentage.
>      pub fn cpu_load(&self) -> f64 {
>          self.cpu / self.maxcpu as f64
> @@ -38,6 +50,11 @@ impl NodeStats {
>      pub fn mem_load(&self) -> f64 {
>          self.mem as f64 / self.maxmem as f64
>      }
> +
> +    /// Returns a combined node usage as a percentage.
> +    pub fn load(&self) -> f64 {
> +        (self.cpu_load() + self.mem_load()) / 2.0
> +    }
>  }

[snip]




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats
  2026-03-30 14:30 ` [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-31 13:25   ` Daniel Kral
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-31 13:25 UTC (permalink / raw)
  To: Daniel Kral, pve-devel; +Cc: Thomas Lamprecht

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
> index 04cd1bfe..b2488ddd 100644
> --- a/src/PVE/HA/Env/PVE2.pm
> +++ b/src/PVE/HA/Env/PVE2.pm
> @@ -588,6 +625,32 @@ sub get_static_node_stats {
>      return $stats;
>  }
>  
> +sub get_dynamic_node_stats {
> +    my ($self) = @_;
> +
> +    my $rrd = PVE::Cluster::rrd_dump();
> +
> +    my $stats = {};
> +    for my $key (keys %$rrd) {
> +        my ($nodename) = $key =~ m/^pve-node-9.0\/(\w+)$/;

Just noticed this while running on another setup, that `(\w+)` should be
replaced with an actual regex for valid nodenames, e.g.,
`([a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)` from PVE::JSONSchema or
even just a `(\S+)` as we don't need to verify the nodename here.

e.g. 'pve-node1' would not work as the char '-' is not element of \w.

I'll wait for other test results before sending a v4 for this and the
other feedback.

> +
> +        next if !$nodename;
> +
> +        my $rrdentry = $rrd->{$key} // [];
> +
> +        my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
> +
> +        $stats->{$nodename} = {
> +            maxcpu => $maxcpu,
> +            cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
> +            maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
> +            mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
> +        };
> +    }
> +
> +    return $stats;
> +}
> +
>  sub get_node_version {
>      my ($self, $node) = @_;
>  





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection
  2026-03-31 12:42   ` Michael Köppl
@ 2026-03-31 13:32     ` Daniel Kral
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-31 13:32 UTC (permalink / raw)
  To: Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 2:42 PM CEST, Michael Köppl wrote:
> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
>
> [snip]
>
>>  
>> +    /// Adds the resource stats to the node stats as if the resource is running on the node.
>> +    pub fn add_running_resource(&mut self, resource_stats: &ResourceStats) {
>> +        self.cpu += resource_stats.cpu;
>> +        self.mem += resource_stats.mem;
>> +    }
>> +
>> +    /// Removes the resource stats from the node stats as if the resource is not running on the node.
>> +    pub fn remove_running_resource(&mut self, resource_stats: &ResourceStats) {
>> +        self.cpu -= resource_stats.cpu;
>
> From what I can gather, due to how the stats are collected, it could
> occur here that self.cpu < 0. I think it could make sense to do
> something like
>
>     self.cpu = f64::max(0.0, self.cpu - resource_stats.cpu);
>
> here to avoid it affecting the node imbalance calculation.
>

Right, thanks!

Even though this shouldn't happen regularly, because for

- the static case node.cpu is made up of all the already running
  resources 'cpu' properties summed, and
- the dynamic case node.cpu >= (sum of resources 'cpu'), as it should at
  least include the reported resources 'cpu' and more because of other
  running resources on the host (e.g. pve backend, zfs, etc.),

this would be a safer option to do regardless. I did a quick benchmark
with a theoretical 10,000 resources 48 nodes cluster and it did not
drastically change the runtime for the worse AFAICT.

I'll wait on more feedback for a v4 or send it as a follow-up depending
on how we apply these patches.

>> +        self.mem = self.mem.saturating_sub(resource_stats.mem);
>> +    }
>> +
>>      /// Returns the current cpu usage as a percentage.
>>      pub fn cpu_load(&self) -> f64 {
>>          self.cpu / self.maxcpu as f64
>> @@ -38,6 +50,11 @@ impl NodeStats {
>>      pub fn mem_load(&self) -> f64 {
>>          self.mem as f64 / self.maxmem as f64
>>      }
>> +
>> +    /// Returns a combined node usage as a percentage.
>> +    pub fn load(&self) -> f64 {
>> +        (self.cpu_load() + self.mem_load()) / 2.0
>> +    }
>>  }
>
> [snip]





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
  2026-03-31  9:07   ` Dominik Rusovac
  2026-03-31  9:07   ` Michael Köppl
@ 2026-03-31 13:50   ` Daniel Kral
  2 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-31 13:50 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> +sub load_balance {
> +    my ($self) = @_;
> +
> +    my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
> +    my ($auto_rebalance_opts) = $crs->{auto_rebalance};
> +
> +    return if !$auto_rebalance_opts->{enable};
> +    return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';

We do not implement the load balancing related methods for
PVE::HA::Usage::Basic.

If for some reason recompute_online_node_usage() fallbacks to use
PVE::HA::Usage::Basic instead of the selected 'static' or 'dynamic' crs
mode, then this guarantee here is wrong and the first call (here
$online_node_usage->calculate_node_imbalance()) will fail.

recompute_online_node_usage() could easily fail e.g. if for 'dynamic'
one node does not have pvestatd running.
PVE::HA::Usage::Dynamic::add_node() will fail then, because there's no
recent node usage data for all nodes to correctly represent the cluster
node usage.

I'll fix this either by changing the $self->{crs}->{mode} value as we
fallback or at least having sensible implementations for Basic as well,
such as 0.0 for node_imbalance and empty lists for the
score_best_balancing_migrations{,_topsis}() methods, or even both.

> +    return if $self->any_resource_motion_queued_or_running();
> +
> +    my ($threshold, $method, $hold_duration, $margin) =
> +        $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
> +
> +    my $imbalance = $online_node_usage->calculate_node_imbalance();
> +
> +    # do not load balance unless imbalance threshold has been exceeded
> +    # consecutively for $hold_duration calls to load_balance()
> +    if ($imbalance < $threshold) {
> +        $self->{sustained_imbalance_round} = 0;
> +        return;
> +    } else {
> +        $self->{sustained_imbalance_round}++;
> +        return if $self->{sustained_imbalance_round} < $hold_duration;
> +        $self->{sustained_imbalance_round} = 0;
> +    }
> +
> +    my $candidates = $self->get_resource_migration_candidates();
> +
> +    my $result;
> +    if ($method eq 'bruteforce') {
> +        $result = $online_node_usage->select_best_balancing_migration($candidates);
> +    } elsif ($method eq 'topsis') {
> +        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
> +    }
> +
> +    # happens if $candidates is empty or $method isn't handled above
> +    return if !$result;
> +
> +    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
> +
> +    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
> +    return if $relative_change < $margin;
> +
> +    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
> +
> +    my (undef, $type, $id) = $haenv->parse_sid($sid);
> +    my $task = $type eq 'vm' ? "migrate" : "relocate";
> +    my $cmd = "$task $sid $target";
> +
> +    my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
> +    $haenv->log(
> +        'info',
> +        "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
> +    );
> +
> +    $self->queue_resource_motion($cmd, $task, $sid, $target);
> +}




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing
  2026-03-31  9:39         ` Dominik Rusovac
@ 2026-03-31 13:55           ` Daniel Kral
  0 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-03-31 13:55 UTC (permalink / raw)
  To: Dominik Rusovac, Michael Köppl, pve-devel

On Tue Mar 31, 2026 at 11:39 AM CEST, Dominik Rusovac wrote:
> On Tue Mar 31, 2026 at 11:32 AM CEST, Daniel Kral wrote:
>> Good catch, thanks to you both!
>>
>> Even though it's unpractical, users can still set the threshold to 0.0,
>> which could actually cause a division by zero here, because the
>> threshold is compared by a >= relation.
>>

[...]

>>
>> The system is rather unstable in that regard anyway (same if $margin =
>> 0.0), because it always tries to load balance every $hold_duration HA
>> rounds.
>>
>> I'm not sure whether we should prevent this with adjusting the range for
>> both the threshold and margin to be at least larger than some minimum
>> value, so that the load balancing system won't become unstable.
>>
>
> yeah, either this, or you add a guard to return early (before the
> threshold guard) whenever imbalance is 0.0, I guess

As discussed off-list, I think it's reasonable to allow an imbalance
$threshold of 0.0, but even then a current $imbalance of 0.0 shouldn't
trigger that, so I'll change the comparison above to a '>' relation
instead of a '>=' relation.

An imbalance $threshold = 0.0 could mean 'try to always find a load
balancing migration'. Though very aggressive, it doesn't do any harm in
itself and doesn't mean the load balancing system will commit to a
rebalancing migration after all.

However, setting $margin = 0.0 will indicate that any migration - even
if it doesn't change the imbalance at all - will be committed. But after
all this is a user configuration and we should check in the datacenter
config that the values aren't invalid (e.g. negative values) and add to
the {verbose_,}description that the value of 0.0 might not be what users
want, but can still do.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* partially-applied: [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (39 preceding siblings ...)
  2026-03-30 14:30 ` [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
@ 2026-03-31 20:44 ` Thomas Lamprecht
  2026-04-02 12:55 ` superseded: " Daniel Kral
  41 siblings, 0 replies; 72+ messages in thread
From: Thomas Lamprecht @ 2026-03-31 20:44 UTC (permalink / raw)
  To: pve-devel, Daniel Kral

On Mon, 30 Mar 2026 16:30:09 +0200, Daniel Kral wrote:
> Most of the patches for the v3 are already R-b'd by @Dominik (many
> thanks!). A lot less has changed than from v1 -> v2, I've still added
> per-patch changelogs to make reviewing the rest more straightforward.
> 
> Most changes went into
> 
> - #05 - resource-scheduling: implement generic cluster usage
>         implementation
> - #09 - resource-scheduling: implement rebalancing migration selection
> - #13 - pve-rs: resource-scheduling: use generic usage implementation
> - #19 - datacenter config: add auto rebalancing options
> - #30 - usage: use add_service to add service usage to nodes
> 
> [...]

Applied the proxmox and pve-rs parts, waiting for a v4 for the rest, thanks!

proxmox:

[1/9] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
      commit: d458ef65736a041ec58ca59d6c803c33670e534a
[2/9] resource-scheduling: move score_nodes_to_start_service to scheduler crate
      commit: 6e538346c302113f2a611dfa1d83147679d51dca
[3/9] resource-scheduling: rename service to resource where appropriate
      commit: d90d70dec2f55fdd93b491ebc982a318fa3850a0
[4/9] resource-scheduling: introduce generic scheduler implementation
      commit: 83bca6f691bf5211f366f88b07c1e5d0154054f7
[5/9] resource-scheduling: implement generic cluster usage implementation
      commit: 02892ae5f610e17d09e220542482803bc957232e
[6/9] resource-scheduling: topsis: handle empty criteria without panics
      commit: 3757312486e536c7dc67c6ca35d246573f8bf9b9
[7/9] resource-scheduling: compare by nodename in score_nodes_to_start_resource
      commit: f05d59e81c55508483099194e5b4274dba690fad
[8/9] resource-scheduling: factor out topsis alternative mapping
      commit: 124eebc7e31fc997d83a1e790d8325f61c3f7036
[9/9] resource-scheduling: implement rebalancing migration selection
      commit: 19a7e0dabc2784b004659c2b837dae1210c380cb

pve-rs:

[1/7] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
      commit: 166f321127f0c72b2a3122abf92757fb135188d3
[2/7] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
      commit: 565f427d4d0ea0cd2d980f6d851b3f6bf52ac1e3
[3/7] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
      commit: 1bac15f2b622bbb703fb69e0964671dd51cd91c2
[4/7] pve-rs: resource-scheduling: use generic usage implementation
      commit: c85a0ccb353624940f3366edb948a87504fd211c
[5/7] pve-rs: resource-scheduling: static: replace deprecated usage structs
      commit: e1ab8dad3012e5175025634a9df41fe31b3af92b
[6/7] pve-rs: resource-scheduling: implement pve_dynamic bindings
      commit: a14036cd58935aad1c75539c3fa9db37f5e4d60f
[7/7] pve-rs: resource-scheduling: expose auto rebalancing methods
      commit: a658483556345b2e1473930b6cad9872fd2d9d8c




^ permalink raw reply	[flat|nested] 72+ messages in thread

* superseded: [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer
  2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
                   ` (40 preceding siblings ...)
  2026-03-31 20:44 ` partially-applied: [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Thomas Lamprecht
@ 2026-04-02 12:55 ` Daniel Kral
  41 siblings, 0 replies; 72+ messages in thread
From: Daniel Kral @ 2026-04-02 12:55 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote:
> Most of the patches for the v3 are already R-b'd by @Dominik (many
> thanks!). A lot less has changed than from v1 -> v2, I've still added
> per-patch changelogs to make reviewing the rest more straightforward.

Superseded-by: https://lore.proxmox.com/pve-devel/20260402124817.416232-1-d.kral@proxmox.com/




^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2026-04-02 12:55 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-30 14:30 [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-30 14:30 ` [PATCH proxmox v3 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-31  6:01   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH proxmox v3 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-31  6:01   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH proxmox v3 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-31  6:02   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH proxmox v3 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-31  6:11   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH proxmox v3 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-31  7:26   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH proxmox v3 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-30 14:30 ` [PATCH proxmox v3 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-30 14:30 ` [PATCH proxmox v3 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-30 14:30 ` [PATCH proxmox v3 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-31  7:33   ` Dominik Rusovac
2026-03-31 12:42   ` Michael Köppl
2026-03-31 13:32     ` Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-31  7:40   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH perl-rs v3 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-30 14:30 ` [PATCH perl-rs v3 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-30 14:30 ` [PATCH cluster v3 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-30 14:30 ` [PATCH cluster v3 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-30 14:30 ` [PATCH cluster v3 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-31  7:52   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-31 13:25   ` Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 29/40] add running flag to non-HA cluster service stats Daniel Kral
2026-03-31  7:58   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-31  8:12   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-31  8:15   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-31  8:20   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 35/40] implement automatic rebalancing Daniel Kral
2026-03-31  9:07   ` Dominik Rusovac
2026-03-31  9:07   ` Michael Köppl
2026-03-31  9:16     ` Dominik Rusovac
2026-03-31  9:32       ` Daniel Kral
2026-03-31  9:39         ` Dominik Rusovac
2026-03-31 13:55           ` Daniel Kral
2026-03-31  9:42     ` Daniel Kral
2026-03-31 11:01       ` Michael Köppl
2026-03-31 13:50   ` Daniel Kral
2026-03-30 14:30 ` [PATCH ha-manager v3 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-31  9:09   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-31  9:33   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 38/40] test: add static " Daniel Kral
2026-03-31  9:44   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-31  9:48   ` Dominik Rusovac
2026-03-30 14:30 ` [PATCH ha-manager v3 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
2026-03-31 10:06   ` Dominik Rusovac
2026-03-31 20:44 ` partially-applied: [PATCH-SERIES cluster/ha-manager/perl-rs/proxmox v3 00/40] dynamic scheduler + load rebalancer Thomas Lamprecht
2026-04-02 12:55 ` superseded: " Daniel Kral

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal