[PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

* [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer
@ 2026-04-02 12:43 Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
                   ` (29 more replies)
  0 siblings, 30 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

Here's the v4 of the load balancer patches for the HA Manager.

Most of the patches here are already R-b'd by @Dominik (many, many
thanks!) and only a few things have changed, the biggest of course is
changing the default node imbalance threshold from '0.7' to '0.3' and
adding the pve-manager patches.

I'm already half-way there with the pve-docs patches, but will send them
in a separate patch series (as the changes are also updating the CRS
section in general).

Thank you very much for the feedback @Dominik, @Thomas, @Maximiliano,
and @Jillian Morgan!



fixes v3 -> v4:
- implement rebalancing methods for PVE::HA::Usage::Basic as well with
  sensible return values; even though it's not used actively if 'basic'
  mode is set, if the static or dynamic does failback to 'basic', it
  won't throw user-visible "unimplemented" errors

  I didn't go for fixing the error chaining yet, as we didn't have test
  cases for that yet and I'd like to go for this in a separate series
  instead to not bloat this series anymore

- allow any valid nodename in get_dynamic_node_stats(), which didn't
  allow any nodename with a hyphen as hyphens are not element of \w.

changes v3 -> v4:
- add pve-manager patches

- rebase all repositories on master

- drop already applied proxmox + proxmox-perl-rs patches

- change default value for node imbalance threshold from 0.7 to 0.3 and
  write a bit about how we got there in the patch message; see the
  relevant patch for that

- use sprintf() instead of the weird perly rounding logic in
  load_balance()

- print the before and expected after imbalance value when rebalancing

- adapt relevant test cases to have the same semantic meaning with the
  default node imbalance threshold



This RFC series proposes an implementation for a dynamic scheduler and
manual/automatic static/dynamic load rebalancer by implementing the
following:

- gather dynamic node and service usage information and use it in the
  dynamic scheduler, and

- implement a load rebalancer, which actively moves HA resources to
  other nodes, to lower the overall cluster node imbalance, while
  adhering to the HA rules.



== Model ==

The automatic load rebalancing system checks whether the cluster node
imbalance exceeds some user-defined threshold for some HA Manager rounds
("hold duration"). If it does exceed on consecutive HA Manager rounds,
it will choose the best service migration/relocation to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined improvement ("margin").

The best service motion can be selected by either bruteforce or TOPSIS.
This selection method and some other parameters from above can be
tweaked at runtime.



== Tests ==

I've added some test cases to ensure more basic decisions are
documented. The other tests were in virtualized clusters with adding
load dynamically to guests with stress-ng, even though I plan to rely
more on real-world load simulators for the next batch of tests.



== Benchmarks ==

I've also done some theoretical benchmarks with the target of being able
to handle a 48 nodes cluster and 9.999 HA resources / guests and a
worst-case scenario of each HA resource being part of 3 HA rules
(pairwise positive and negative resource affinity rules, where each
positive resource affinity pair has a common node affinity rule).

Generating the migration candidates for the huge cluster with the
worst-case HA ruleset takes 243 +- 9 ms.

Generating the migration candidates for the huge cluster without the
worst-case HA ruleset (to gain the most amount of 459954 migration
candidates) takes 356 +- 6 ms. This is expected, because we need to
evaluate more HA resources' rules as there are no HA resource bundles.

Excluding the generation, the brute force and TOPSIS method for
select_best_balancing_migration() were roughly similar both being in the
range 350 +- 50 ms for the huge cluster without any HA rules (for the
maximum amount of migration candidates) including the serialization
between Perl and Rust.



== Future ideas ==

- include the migration costs in score_best_balancing_migrations(),
  e.g., so that VMs with lots of memory are less likely to be migrated
  if the link between the nodes is slow, but that would need measuring
  and storing the migration network link speeds as a mesh

- apply some filter like moving average window or exponential smoothing
  on the usage time series to dampen spikes; triple exponential
  smoothing (Holts-Winters) is also already implemented in rrdcached and
  allows for exponential smoothing with better time series analysis but
  would require changing the rrdcached data structure once more

- score_best_balancing_migrations(...) can already provide a
  size-limited list of the best migrations, which could be exposed to
  users to allow manual load balancing actions, e.g., from the web
  interface, to get some insight in the system

- The current scheduler can only solve bin covering, but it would be
  interesting to also allow bin packing if certain criteria are met,
  e.g., for energy preservation while the overall cluster load is low

- Allow individual HA resources to be actively excluded from the
  automatic rebalancing, e.g., because containers cannot be live
  migrated.

- move the migration candidate generation to the rust-side; the
  generation on the perl-side was chosen first to reduce code
  duplication, but it doesn't seem future proof and right to copy state
  to the online_node_usage object twice (medium priority)



== Diffstat ==


cluster:

Daniel Kral (3):
  datacenter config: restructure verbose description for the ha crs
    option
  datacenter config: add dynamic load scheduler option
  datacenter config: add auto rebalancing options

 src/PVE/DataCenterConfig.pm | 56 ++++++++++++++++++++++++++++++++++---
 1 file changed, 52 insertions(+), 4 deletions(-)


ha-manager:

Daniel Kral (15):
  env: pve2: implement dynamic node and service stats
  usage: pass service data to add_service_usage
  usage: pass service data to get_used_service_nodes
  add running flag to non-HA cluster service stats
  usage: use add_service to add service usage to nodes
  usage: add dynamic usage scheduler
  test: add dynamic usage scheduler test cases
  manager: rename execute_migration to queue_resource_motion
  manager: update_crs_scheduler_mode: factor out crs config
  implement automatic rebalancing
  test: add resource bundle generation test cases
  test: add dynamic automatic rebalancing system test cases
  test: add static automatic rebalancing system test cases
  test: add automatic rebalancing system test cases with TOPSIS method
  test: add automatic rebalancing system test cases with affinity rules

Dominik Rusovac (6):
  sim: hardware: pass correct types for static stats
  sim: hardware: factor out static stats' default values
  sim: hardware: fix static stats guard
  sim: hardware: handle dynamic service stats
  sim: hardware: add set-dynamic-stats command
  sim: hardware: add getters for dynamic {node,service} stats

 debian/pve-ha-manager.install                 |   1 +
 src/PVE/HA/Env.pm                             |  12 +
 src/PVE/HA/Env/PVE2.pm                        |  66 +++++
 src/PVE/HA/Manager.pm                         | 219 +++++++++++++++-
 src/PVE/HA/Rules/ResourceAffinity.pm          |   3 +-
 src/PVE/HA/Sim/Env.pm                         |  12 +
 src/PVE/HA/Sim/Hardware.pm                    | 185 ++++++++++++--
 src/PVE/HA/Sim/RTHardware.pm                  |   4 +-
 src/PVE/HA/Usage.pm                           |  64 +++--
 src/PVE/HA/Usage/Basic.pm                     |  27 +-
 src/PVE/HA/Usage/Dynamic.pm                   | 155 ++++++++++++
 src/PVE/HA/Usage/Makefile                     |   2 +-
 src/PVE/HA/Usage/Static.pm                    |  63 ++++-
 src/test/Makefile                             |   1 +
 .../README                                    |   2 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../README                                    |   7 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../README                                    |   4 +
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |   4 +
 .../cmdlist                                   |  16 ++
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  89 +++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |  11 +
 .../cmdlist                                   |  13 +
 .../datacenter.cfg                            |   9 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance0/README   |   2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   1 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  11 +
 .../manager_status                            |   1 +
 .../service_config                            |   1 +
 .../static_service_stats                      |   1 +
 .../test-crs-dynamic-auto-rebalance1/README   |   7 +
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   3 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-dynamic-auto-rebalance2/README   |   4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-dynamic-auto-rebalance3/README   |   4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  |  16 ++
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  89 +++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../test-crs-dynamic-auto-rebalance4/README   |  11 +
 .../test-crs-dynamic-auto-rebalance4/cmdlist  |  13 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   9 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 .../README                                    |   7 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   5 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  49 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |   5 +
 .../static_service_stats                      |   5 +
 .../README                                    |  12 +
 .../cmdlist                                   |   8 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  53 ++++
 .../manager_status                            |   1 +
 .../rules_config                              |   3 +
 .../service_config                            |   4 +
 .../static_service_stats                      |   4 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   8 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |  31 +++
 .../rules_config                              |   3 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../README                                    |  14 ++
 .../cmdlist                                   |   3 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   6 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../rules_config                              |   7 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 src/test/test-crs-dynamic-rebalance1/README   |   3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |   4 +
 .../datacenter.cfg                            |   7 +
 .../dynamic_service_stats                     |   7 +
 .../hardware_status                           |   5 +
 .../test-crs-dynamic-rebalance1/log.expect    |  82 ++++++
 .../manager_status                            |   1 +
 .../service_config                            |   7 +
 .../static_service_stats                      |   7 +
 src/test/test-crs-dynamic1/README             |   4 +
 src/test/test-crs-dynamic1/cmdlist            |   4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |   6 +
 .../test-crs-dynamic1/dynamic_service_stats   |   3 +
 src/test/test-crs-dynamic1/hardware_status    |   5 +
 src/test/test-crs-dynamic1/log.expect         |  51 ++++
 src/test/test-crs-dynamic1/manager_status     |   1 +
 src/test/test-crs-dynamic1/service_config     |   3 +
 .../test-crs-dynamic1/static_service_stats    |   3 +
 .../test-crs-static-auto-rebalance1/README    |   7 +
 .../test-crs-static-auto-rebalance1/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  25 ++
 .../manager_status                            |   1 +
 .../service_config                            |   3 +
 .../static_service_stats                      |   3 +
 .../test-crs-static-auto-rebalance2/README    |   4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |   3 +
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  59 +++++
 .../manager_status                            |   1 +
 .../service_config                            |   6 +
 .../static_service_stats                      |   6 +
 .../test-crs-static-auto-rebalance3/README    |   3 +
 .../test-crs-static-auto-rebalance3/cmdlist   |  15 ++
 .../datacenter.cfg                            |   7 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  97 ++++++++
 .../manager_status                            |   1 +
 .../service_config                            |   9 +
 .../static_service_stats                      |   9 +
 src/test/test_resource_bundles.pl             | 234 ++++++++++++++++++
 187 files changed, 2867 insertions(+), 50 deletions(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats
 create mode 100755 src/test/test_resource_bundles.pl


manager:

Daniel Kral (4):
  ui: dc/options: make the ha crs strings translatable
  ui: dc/options: add dynamic load scheduler option for ha crs
  ui: move cluster resource scheduling from dc/options into separate
    component
  ui: form: add crs auto rebalancing options

 www/manager6/Makefile           |   1 +
 www/manager6/dc/OptionView.js   |  37 ++--------
 www/manager6/form/CRSOptions.js | 116 ++++++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+), 30 deletions(-)
 create mode 100644 www/manager6/form/CRSOptions.js


Summary over all repositories:
  191 files changed, 3043 insertions(+), 84 deletions(-)

-- 
Generated by murpp 0.11.0




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option Daniel Kral
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

This makes it a little easier to read and allows appending descriptions
for other values with a cleaner diff.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/DataCenterConfig.pm | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index d88b167..c275163 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -17,9 +17,12 @@ my $crs_format = {
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
-        verbose_description => "Configures how the HA manager should select nodes to start or "
-            . "recover services. With 'basic', only the number of services is used, with 'static', "
-            . "static CPU and memory configuration of services is considered.",
+        verbose_description => <<EODESC,
+Configures how the HA Manager should select nodes to start or recover services:
+
+- with 'basic', only the number of services is used,
+- with 'static', static CPU and memory configuration of services are considered.
+EODESC
     },
     'ha-rebalance-on-start' => {
         type => 'boolean',
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/DataCenterConfig.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index c275163..0225bc6 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -13,7 +13,7 @@ my $PROXMOX_OUI = 'BC:24:11';
 my $crs_format = {
     ha => {
         type => 'string',
-        enum => ['basic', 'static'],
+        enum => ['basic', 'static', 'dynamic'],
         optional => 1,
         default => 'basic',
         description => "Use this resource scheduler mode for HA.",
@@ -21,7 +21,8 @@ my $crs_format = {
 Configures how the HA Manager should select nodes to start or recover services:
 
 - with 'basic', only the number of services is used,
-- with 'static', static CPU and memory configuration of services are considered.
+- with 'static', static CPU and memory configuration of services are considered,
+- with 'dynamic', static and dynamic CPU and memory usage of services are considered.
 EODESC
     },
     'ha-rebalance-on-start' => {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
  2026-04-02 12:43 ` [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
  2026-04-02 13:07   ` Dominik Rusovac
  2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

These options control the behavior of the load balancing system in the
HA Manager.

The imbalance threshold default value is set to `0.3`, as
experimentation with some common cluster sizes showed good results. This
might need more adaption in the future, such as a cluster-size-dependent
profile setting to find a better threshold default value.

Another inbalance threshold default value, which was considered, was
`0.15`, which is the minimum threshold to detect an imbalance in a
cluster with one node with load 0.0 and the other nodes with load 1.0
for a cluster size of up to 45 nodes. For cluster size N, this is
derived with:

    node_loads = [0.0] + [1.0 for _ in range(N-1)]
    min_imbalance = calculate_node_imbalance(node_loads)

Though a good starting metric, the imbalance threshold of `0.15` would
be too sensitive for small cluster sizes and `0.3` was a better balance
for that.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- change threshold default value from 0.7 to 0.3
- add minimum requirements to number fields

 src/PVE/DataCenterConfig.pm | 44 +++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 0225bc6..6513594 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -33,6 +33,50 @@ EODESC
             "Set to use CRS for selecting a suited node when a HA services request-state"
             . " changes from stop to start.",
     },
+    'ha-auto-rebalance' => {
+        type => 'boolean',
+        optional => 1,
+        default => 0,
+        description => "Whether to use CRS for balancing HA resources automatically"
+            . " depending on the current node imbalance.",
+    },
+    'ha-auto-rebalance-threshold' => {
+        type => 'number',
+        optional => 1,
+        minimum => 0.0,
+        default => 0.3,
+        requires => 'ha-auto-rebalance',
+        description => "The threshold for the cluster node imbalance, which will"
+            . " trigger the automatic resource balancing system if its value"
+            . " is exceeded.",
+    },
+    'ha-auto-rebalance-method' => {
+        type => 'string',
+        enum => ['bruteforce', 'topsis'],
+        optional => 1,
+        default => 'bruteforce',
+        requires => 'ha-auto-rebalance',
+        description => "The method to use for the scoring of balancing migrations.",
+    },
+    'ha-auto-rebalance-hold-duration' => {
+        type => 'number',
+        optional => 1,
+        minimum => 0,
+        default => 3,
+        requires => 'ha-auto-rebalance',
+        description => "The number of HA rounds for which the cluster node"
+            . " imbalance threshold must be exceeded before triggering an"
+            . " automatic resource balancing migration.",
+    },
+    'ha-auto-rebalance-margin' => {
+        type => 'number',
+        optional => 1,
+        minimum => 0.0,
+        default => 0.1,
+        requires => 'ha-auto-rebalance',
+        description => "The minimum relative improvement in cluster node"
+            . " imbalance to commit to a resource balancing migration.",
+    },
 };
 
 my $migration_format = {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options
  2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-04-02 13:07   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:07 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:43 PM CEST, Daniel Kral wrote:
> These options control the behavior of the load balancing system in the
> HA Manager.
>
> The imbalance threshold default value is set to `0.3`, as
> experimentation with some common cluster sizes showed good results. This
> might need more adaption in the future, such as a cluster-size-dependent
> profile setting to find a better threshold default value.

+1

>
> Another inbalance threshold default value, which was considered, was
> `0.15`, which is the minimum threshold to detect an imbalance in a
> cluster with one node with load 0.0 and the other nodes with load 1.0
> for a cluster size of up to 45 nodes. For cluster size N, this is
> derived with:
>
>     node_loads = [0.0] + [1.0 for _ in range(N-1)]
>     min_imbalance = calculate_node_imbalance(node_loads)
>
> Though a good starting metric, the imbalance threshold of `0.15` would
> be too sensitive for small cluster sizes and `0.3` was a better balance
> for that.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - change threshold default value from 0.7 to 0.3
> - add minimum requirements to number fields
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (2 preceding siblings ...)
  2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
  2026-04-02 13:40   ` Dominik Rusovac
  2026-04-02 12:43 ` [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats Daniel Kral
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

Fetch the dynamic node and service stats with rrd_dump(), which is
periodically sampled and broadcasted by the PVE nodes' pvestatd service
and propagated through the pmxcfs.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- introduce $HOSTNAME_RE and use it for nodename matching in
  get_dynamic_node_stats()

 src/PVE/HA/Env/PVE2.pm | 65 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 04cd1bfe..fc815fe0 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -42,8 +42,23 @@ my $lockdir = "/etc/pve/priv/lock";
 # taken from PVE::Service::pvestatd::update_{lxc,qemu}_status()
 use constant {
     RRD_VM_INDEX_STATUS => 2,
+    RRD_VM_INDEX_MAXCPU => 5,
+    RRD_VM_INDEX_CPU => 6,
+    RRD_VM_INDEX_MAXMEM => 7,
+    RRD_VM_INDEX_MEM => 8,
 };
 
+# rrd entry indices for PVE nodes
+# taken from PVE::Service::pvestatd::update_node_status()
+use constant {
+    RRD_NODE_INDEX_MAXCPU => 4,
+    RRD_NODE_INDEX_CPU => 5,
+    RRD_NODE_INDEX_MAXMEM => 7,
+    RRD_NODE_INDEX_MEM => 8,
+};
+
+my $HOSTNAME_RE = qr/(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}?[a-zA-Z0-9])?)/;
+
 sub new {
     my ($this, $nodename) = @_;
 
@@ -569,6 +584,30 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = get_cluster_service_stats();
+    for my $sid (keys %$stats) {
+        my $id = $stats->{$sid}->{id};
+        my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
+
+        # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
+        my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
+
+        $stats->{$sid}->{usage} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -588,6 +627,32 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $rrd = PVE::Cluster::rrd_dump();
+
+    my $stats = {};
+    for my $key (keys %$rrd) {
+        my ($nodename) = $key =~ m/^pve-node-9.0\/($HOSTNAME_RE)$/;
+
+        next if !$nodename;
+
+        my $rrdentry = $rrd->{$key} // [];
+
+        my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
+
+        $stats->{$nodename} = {
+            maxcpu => $maxcpu,
+            cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+            maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
+            mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
+        };
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats
  2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-04-02 13:40   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:40 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:43 PM CEST, Daniel Kral wrote:
> Fetch the dynamic node and service stats with rrd_dump(), which is
> periodically sampled and broadcasted by the PVE nodes' pvestatd service
> and propagated through the pmxcfs.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - introduce $HOSTNAME_RE and use it for nodename matching in
>   get_dynamic_node_stats()
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (3 preceding siblings ...)
  2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values Daniel Kral
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

CRM expects f64 for cpu-related values and usize for mem-related values.
Hence, pass doubles for the former and ints for the latter.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Hardware.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 474cee16..cfcd7ab1 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -488,9 +488,9 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
+            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +507,7 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4, maxmem => 4096 } } keys %$services };
+        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
         $self->write_static_service_stats($stats);
     }
 
@@ -883,7 +883,7 @@ sub sim_hardware_cmd {
 
                 $self->set_static_service_stats(
                     $sid,
-                    { maxcpu => $params[0], maxmem => $params[1] },
+                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
                 );
 
             } elsif ($action eq 'manual-migrate') {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (4 preceding siblings ...)
  2026-04-02 12:43 ` [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard Daniel Kral
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Hardware.pm | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index cfcd7ab1..026be6f8 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,6 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_maxcpu = 4.0;
+my $default_service_maxmem = 4096 * 1024**2;
+my $default_node_maxcpu = 24.0;
+my $default_node_maxmem = 131072 * 1024**2;
+
 # Status directory layout
 #
 # configuration
@@ -488,9 +493,24 @@ sub new {
             || die "Copy failed: $!\n";
     } else {
         my $cstatus = {
-            node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
-            node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+            node1 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node2 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
+            node3 => {
+                power => 'off',
+                network => 'off',
+                maxcpu => $default_node_maxcpu,
+                maxmem => $default_node_maxmem,
+            },
         };
         $self->write_hardware_status_nolock($cstatus);
     }
@@ -507,7 +527,12 @@ sub new {
         copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
     } else {
         my $services = $self->read_service_config();
-        my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
+        my $stats = {
+            map {
+                $_ => { maxcpu => $default_service_maxcpu, maxmem => $default_service_maxmem }
+                }
+                keys %$services
+        };
         $self->write_static_service_stats($stats);
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (5 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats Daniel Kral
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

While falsy, values of 0 or 0.0 are valid stats. Hence, use
'defined'-check to avoid skipping falsy static service stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Hardware.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 026be6f8..afdb7b5f 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -202,11 +202,11 @@ sub set_static_service_stats {
 
     my $stats = $self->read_static_service_stats();
 
-    if (my $memory = $new_stats->{maxmem}) {
+    if (defined(my $memory = $new_stats->{maxmem})) {
         $stats->{$sid}->{maxmem} = $memory;
     }
 
-    if (my $cpu = $new_stats->{maxcpu}) {
+    if (defined(my $cpu = $new_stats->{maxcpu})) {
         $stats->{$sid}->{maxcpu} = $cpu;
     }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (6 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command Daniel Kral
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

This adds functionality to simulate dynamic stats of a service, that is,
cpu load (cores) and memory usage (MiB).

Analogous to static service stats, within tests, dynamic service stats
can be specified in file dynamic_service_stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Hardware.pm | 52 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index afdb7b5f..3439bc36 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,8 +21,11 @@ use PVE::HA::Groups;
 
 my $watchdog_timeout = 60;
 
+my $default_service_cpu = 2.0;
 my $default_service_maxcpu = 4.0;
+my $default_service_mem = 2048 * 1024**2;
 my $default_service_maxmem = 4096 * 1024**2;
+
 my $default_node_maxcpu = 24.0;
 my $default_node_maxmem = 131072 * 1024**2;
 
@@ -213,6 +216,25 @@ sub set_static_service_stats {
     $self->write_static_service_stats($stats);
 }
 
+sub set_dynamic_service_stats {
+    my ($self, $sid, $new_stats) = @_;
+
+    my $conf = $self->read_service_config();
+    die "no such service '$sid'" if !$conf->{$sid};
+
+    my $stats = $self->read_dynamic_service_stats();
+
+    if (defined(my $memory = $new_stats->{mem})) {
+        $stats->{$sid}->{mem} = $memory;
+    }
+
+    if (defined(my $cpu = $new_stats->{cpu})) {
+        $stats->{$sid}->{cpu} = $cpu;
+    }
+
+    $self->write_dynamic_service_stats($stats);
+}
+
 sub add_service {
     my ($self, $sid, $opts, $running) = @_;
 
@@ -438,6 +460,16 @@ sub read_static_service_stats {
     return $stats;
 }
 
+sub read_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
+    $self->log('error', "loading dynamic service stats failed - $@") if $@;
+
+    return $stats;
+}
+
 sub write_static_service_stats {
     my ($self, $stats) = @_;
 
@@ -446,6 +478,14 @@ sub write_static_service_stats {
     $self->log('error', "writing static service stats failed - $@") if $@;
 }
 
+sub write_dynamic_service_stats {
+    my ($self, $stats) = @_;
+
+    my $filename = "$self->{statusdir}/dynamic_service_stats";
+    eval { PVE::HA::Tools::write_json_to_file($filename, $stats) };
+    $self->log('error', "writing dynamic service stats failed - $@") if $@;
+}
+
 sub new {
     my ($this, $testdir) = @_;
 
@@ -536,6 +576,18 @@ sub new {
         $self->write_static_service_stats($stats);
     }
 
+    if (-f "$testdir/dynamic_service_stats") {
+        copy("$testdir/dynamic_service_stats", "$statusdir/dynamic_service_stats");
+    } else {
+        my $services = $self->read_static_service_stats();
+        my $stats = {
+            map { $_ => { cpu => $default_service_cpu, mem => $default_service_mem } }
+                keys %$services
+        };
+
+        $self->write_dynamic_service_stats($stats);
+    }
+
     my $cstatus = $self->read_hardware_status_nolock();
 
     foreach my $node (sort keys %$cstatus) {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (7 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Add command to set dynamic service stats and handle respective commands
set-dynamic-stats and set-static-stats analogously.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Hardware.pm   | 34 ++++++++++++++++++++++++++--------
 src/PVE/HA/Sim/RTHardware.pm |  4 +++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 3439bc36..b641f3c9 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -795,7 +795,8 @@ sub get_cfs_state {
 #   service <sid> stop <timeout>
 #   service <sid> lock/unlock [lockname]
 #   service <sid> add <node> [<request-state=started>] [<running=0>]
-#   service <sid> set-static-stats <maxcpu> <maxmem>
+#   service <sid> set-static-stats  [maxcpu <cores>] [maxmem <MiB>]
+#   service <sid> set-dynamic-stats [cpu <cores>] [mem <MiB>]
 #   service <sid> delete
 sub sim_hardware_cmd {
     my ($self, $cmdstr, $logid) = @_;
@@ -954,15 +955,32 @@ sub sim_hardware_cmd {
                     $params[2] || 0,
                 );
 
-            } elsif ($action eq 'set-static-stats') {
-                die "sim_hardware_cmd: missing maxcpu for '$action' command" if !$params[0];
-                die "sim_hardware_cmd: missing maxmem for '$action' command" if !$params[1];
+            } elsif ($action eq 'set-static-stats' || $action eq 'set-dynamic-stats') {
+                die "sim_hardware_cmd: missing target stat for '$action' command"
+                    if !@params;
 
-                $self->set_static_service_stats(
-                    $sid,
-                    { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
-                );
+                my $conversions =
+                    $action eq 'set-static-stats'
+                    ? { maxcpu => sub { 0.0 + $_[0] }, maxmem => sub { $_[0] * 1024**2 } }
+                    : { cpu => sub { 0.0 + $_[0] }, mem => sub { $_[0] * 1024**2 } };
 
+                my %new_stats;
+                for my ($target, $val) (@params) {
+                    die "sim_hardware_cmd: missing value for '$action $target' command"
+                        if !defined($val);
+
+                    my $convert = $conversions->{$target}
+                        or die
+                        "sim_hardware_cmd: unknown target stat '$target' for '$action' command";
+
+                    $new_stats{$target} = $convert->($val);
+                }
+
+                if ($action eq 'set-static-stats') {
+                    $self->set_static_service_stats($sid, \%new_stats);
+                } else {
+                    $self->set_dynamic_service_stats($sid, \%new_stats);
+                }
             } elsif ($action eq 'manual-migrate') {
 
                 die "sim_hardware_cmd: missing target node for '$action' command"
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 9a83d098..9528f542 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -532,7 +532,9 @@ sub show_service_add_dialog {
 
         my $maxcpu = $cpu_count_spin->get_value();
         my $maxmem = $memory_spin->get_value();
-        $self->sim_hardware_cmd("service $sid set-static-stats $maxcpu $maxmem", 'command');
+        $self->sim_hardware_cmd(
+            "service $sid set-static-stats maxcpu $maxcpu maxmem $maxmem", 'command',
+        );
 
         $self->add_service_to_gui($sid);
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (8 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage Daniel Kral
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

From: Dominik Rusovac <d.rusovac@proxmox.com>

Aggregation of dynamic node stats is lazy.

Getters log on warning level in case of overcommitted stats.

Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Sim/Env.pm      | 12 ++++++++
 src/PVE/HA/Sim/Hardware.pm | 61 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index ad51245c..65d4efad 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -500,12 +500,24 @@ sub get_static_service_stats {
     return $self->{hardware}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{hardware}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index b641f3c9..1959f5c9 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1232,6 +1232,27 @@ sub get_static_service_stats {
     return $stats;
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    my $stats = get_cluster_service_stats($self);
+    my $static_stats = $self->read_static_service_stats();
+    my $dynamic_stats = $self->read_dynamic_service_stats();
+
+    for my $sid (keys %$stats) {
+        $stats->{$sid}->{usage} = {
+            $static_stats->{$sid}->%*, $dynamic_stats->{$sid}->%*,
+        };
+
+        $self->log('warning', "overcommitted cpu on '$sid'")
+            if $stats->{$sid}->{usage}->{cpu} > $stats->{$sid}->{usage}->{maxcpu};
+        $self->log('warning', "overcommitted mem on '$sid'")
+            if $stats->{$sid}->{usage}->{mem} > $stats->{$sid}->{usage}->{maxmem};
+    }
+
+    return $stats;
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
@@ -1245,6 +1266,46 @@ sub get_static_node_stats {
     return $stats;
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    my $stats = $self->get_static_node_stats();
+    for my $node (keys %$stats) {
+        $stats->{$node}->{maxcpu} = $stats->{$node}->{maxcpu} // $default_node_maxcpu;
+        $stats->{$node}->{cpu} = $stats->{$node}->{cpu} // 0.0;
+        $stats->{$node}->{maxmem} = $stats->{$node}->{maxmem} // $default_node_maxmem;
+        $stats->{$node}->{mem} = $stats->{$node}->{mem} // 0;
+    }
+
+    my $service_conf = $self->read_service_config();
+    my $dynamic_service_stats = $self->get_dynamic_service_stats();
+
+    my $cstatus = $self->read_hardware_status_nolock();
+    my $node_service_status = { map { $_ => $self->read_service_status($_) } keys %$cstatus };
+
+    for my $sid (keys %$service_conf) {
+        my $node = $service_conf->{$sid}->{node};
+
+        # only add the dynamic load usage to node if service is actually marked
+        # as running by the node service status written by the LRM
+        if ($node_service_status->{$node}->{$sid}) {
+            my ($cpu, $mem) = $dynamic_service_stats->{$sid}->{usage}->@{qw(cpu mem)};
+
+            die "unknown cpu load for '$sid'" if !defined($cpu);
+            $stats->{$node}->{cpu} += $cpu;
+            $self->log('warning', "overcommitted cpu on '$node'")
+                if $stats->{$node}->{cpu} > $stats->{$node}->{maxcpu};
+
+            die "unknown memory usage for '$sid'" if !defined($mem);
+            $stats->{$node}->{mem} += $mem;
+            $self->log('warning', "overcommitted mem on '$node'")
+                if $stats->{$node}->{mem} > $stats->{$node}->{maxmem};
+        }
+    }
+
+    return $stats;
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (9 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes Daniel Kral
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

The method is already dependent on three members of the service data and
in a following patch a fourth member is needed for adding more
information to the Usage implementations.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Manager.pm | 11 +++++------
 src/PVE/HA/Usage.pm   |  6 +++---
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index fbc7f931..71f45b5c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -284,17 +284,17 @@ sub recompute_online_node_usage {
     foreach my $sid (sort keys %{ $self->{ss} }) {
         my $sd = $self->{ss}->{$sid};
 
-        $online_node_usage->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     # add remaining non-HA resources to online node usage
     for my $sid (sort keys %$service_stats) {
         next if $self->{ss}->{$sid};
 
-        my ($node, $state) = $service_stats->{$sid}->@{qw(node state)};
-
         # the migration target is not known for non-HA resources
-        $online_node_usage->add_service_usage($sid, $state, $node, undef);
+        my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+
+        $online_node_usage->add_service_usage($sid, $sd);
     }
 
     $self->{online_node_usage} = $online_node_usage;
@@ -332,8 +332,7 @@ my $change_service_state = sub {
     }
 
     $self->{online_node_usage}->remove_service_usage($sid);
-    $self->{online_node_usage}
-        ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+    $self->{online_node_usage}->add_service_usage($sid, $sd);
 
     $sd->{uid} = compute_new_uuid($new_state);
 
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 9f19a82b..6d53f956 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -40,12 +40,12 @@ sub add_service_usage_to_node {
     die "implement in subclass";
 }
 
-# Adds service $sid's usage to the online nodes according to their $state,
-# $service_node and $migration_target.
+# Adds service $sid's usage to the online nodes according to their service data $sd.
 sub add_service_usage {
-    my ($self, $sid, $service_state, $service_node, $migration_target) = @_;
+    my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
+    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
     my ($current_node, $target_node) =
         get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (10 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats Daniel Kral
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

Remove some unnecessary destructuring syntax for the helper.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Rules/ResourceAffinity.pm |  3 +--
 src/PVE/HA/Usage.pm                  | 13 ++++++-------
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 1c610430..474d3000 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -511,8 +511,7 @@ sub get_resource_affinity {
     my $get_used_service_nodes = sub {
         my ($sid) = @_;
         return (undef, undef) if !defined($ss->{$sid});
-        my ($state, $node, $target) = $ss->{$sid}->@{qw(state node target)};
-        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $state, $node, $target);
+        return PVE::HA::Usage::get_used_service_nodes($online_nodes, $ss->{$sid});
     };
 
     for my $csid (keys $positive->%*) {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 6d53f956..be3e64d6 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -45,9 +45,7 @@ sub add_service_usage {
     my ($self, $sid, $sd) = @_;
 
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
-    my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
-    my ($current_node, $target_node) =
-        get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
+    my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
     $self->add_service_usage_to_node($current_node, $sid) if $current_node;
     $self->add_service_usage_to_node($target_node, $sid) if $target_node;
@@ -66,11 +64,12 @@ sub score_nodes_to_start_service {
     die "implement in subclass";
 }
 
-# Returns the current and target node as a two-element array, that a service
-# puts load on according to the $online_nodes and the service's $state, $node
-# and $target.
+# Returns a two-element array of the nodes a service puts load on
+# (current and target), given $online_nodes and service data $sd.
 sub get_used_service_nodes {
-    my ($online_nodes, $state, $node, $target) = @_;
+    my ($online_nodes, $sd) = @_;
+
+    my ($state, $node, $target) = $sd->@{qw(state node target)};
 
     return (undef, undef) if $state eq 'stopped' || $state eq 'request_start';
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (11 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes Daniel Kral
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

The running flag is needed to discriminate starting and started
resources from each other, which is a required parameter for using the
new add_service(...) method for the resource scheduling bindings.

The HA Manager tracks whether HA resources are in 'started' state and
whether the LRM acknowledged that these are running. For non-HA
resources, the rrd_dump data contains a running flag for VM and CT
guests.

See the next patch for the usage implementations, which passes the
running flag to the add_service(...) method, for more information about
the details.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Env/PVE2.pm     | 1 +
 src/PVE/HA/Manager.pm      | 2 +-
 src/PVE/HA/Sim/Hardware.pm | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index fc815fe0..3caf32fc 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -551,6 +551,7 @@ my sub get_cluster_service_stats {
             id => $id,
             node => $nodename,
             state => $state,
+            running => $state eq 'started',
             type => $type,
             usage => {},
         };
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 71f45b5c..5b2715c7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -292,7 +292,7 @@ sub recompute_online_node_usage {
         next if $self->{ss}->{$sid};
 
         # the migration target is not known for non-HA resources
-        my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+        my $sd = { $service_stats->{$sid}->%{qw(node state running)} };
 
         $online_node_usage->add_service_usage($sid, $sd);
     }
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 1959f5c9..82f85c97 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1201,6 +1201,7 @@ my sub get_cluster_service_stats {
         $stats->{$sid} = {
             node => $cfg->{node},
             state => $cfg->{state},
+            running => $cfg->{state} eq 'started',
             usage => {},
         };
     }
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (12 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler Daniel Kral
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

The pve_static (and upcoming pve_dynamic) bindings expose the new
add_resource(...) method, which allow adding resources in a single call
with the additional running flag.

The running flag is needed to discriminate starting and started HA
resources from each other, which is needed to correctly account for HA
resources for the dynamic load usage implementation in the next patch.

This is because for the dynamic load usage, any HA resource, which is
scheduled to start by the HA Manager in the same round, will not be
accounted for in the next call to score_nodes_to_start_resource(...).
This is not a problem for the static load usage, because there the
current node usages are derived from the started resources on every
call already.

Passing only the HA resources' 'state' property is not enough since the
HA Manager will move any HA resource from the 'request_start' (or
through other transient states such as 'request_start_balance' and a
successful 'migrate'/'relocate') into the 'started' state.

This 'started' state is then picked up by the HA resource's LRM, which
will actually start the HA resource and if successful respond with a
'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
adding the running flag to the HA resource's state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Usage.pm        | 13 ++++++++-----
 src/PVE/HA/Usage/Basic.pm  |  9 ++++++++-
 src/PVE/HA/Usage/Static.pm | 30 ++++++++++++++++++++++++------
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index be3e64d6..43feb041 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,9 +33,8 @@ sub contains_node {
     die "implement in subclass";
 }
 
-# Logs a warning to $haenv upon failure, but does not die.
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
     die "implement in subclass";
 }
@@ -47,8 +46,12 @@ sub add_service_usage {
     my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
     my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
 
-    $self->add_service_usage_to_node($current_node, $sid) if $current_node;
-    $self->add_service_usage_to_node($target_node, $sid) if $target_node;
+    # some usage implementations need to discern whether a service is truly running;
+    # a service does only have the 'running' flag in 'started' state
+    my $running = ($sd->{state} eq 'started' && $sd->{running})
+        || ($sd->{state} ne 'started' && defined($current_node));
+
+    $self->add_service($sid, $current_node, $target_node, $running);
 }
 
 sub remove_service_usage {
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 2584727b..5aa3ac05 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -38,7 +38,7 @@ sub contains_node {
     return defined($self->{nodes}->{$nodename});
 }
 
-sub add_service_usage_to_node {
+my sub add_service_usage_to_node {
     my ($self, $nodename, $sid) = @_;
 
     if ($self->contains_node($nodename)) {
@@ -51,6 +51,13 @@ sub add_service_usage_to_node {
     }
 }
 
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    add_service_usage_to_node($self, $current_node, $sid) if defined($current_node);
+    add_service_usage_to_node($self, $target_node, $sid) if defined($target_node);
+}
+
 sub remove_service_usage {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index b60f5000..8c7a614b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -71,17 +71,35 @@ my sub get_service_usage {
     return $service_stats;
 }
 
-sub add_service_usage_to_node {
-    my ($self, $nodename, $sid) = @_;
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
 
-    $self->{'node-services'}->{$nodename}->{$sid} = 1;
+    # do not add service which do not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
+
+    # PVE::RS::ResourceScheduling::Static::add_service() expects $current_node
+    # to be set, so consider $target_node as $current_node for unset $current_node;
+    #
+    # currently, this happens for the request_start_balance service state and if
+    # node maintenance causes services to migrate to other nodes
+    if (!defined($current_node)) {
+        $current_node = $target_node;
+        undef $target_node;
+    }
 
     eval {
         my $service_usage = get_service_usage($self, $sid);
-        $self->{scheduler}->add_service_usage_to_node($nodename, $sid, $service_usage);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            'current-node' => $current_node,
+            'target-node' => $target_node,
+        };
+
+        $self->{scheduler}->add_service($sid, $service);
     };
-    $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
-        if $@;
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
 }
 
 sub remove_service_usage {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (13 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases Daniel Kral
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

The dynamic usage scheduler allows the HA Manager to make scheduling
decisions based on the current usage of the nodes and cluster resources
in addition to the maximum usage stats as reported by the PVE::HA::Env
implementation.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 debian/pve-ha-manager.install |   1 +
 src/PVE/HA/Env.pm             |  12 ++++
 src/PVE/HA/Manager.pm         |  21 ++++++
 src/PVE/HA/Usage/Dynamic.pm   | 122 ++++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Makefile     |   2 +-
 5 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage/Dynamic.pm

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 38d5d60b..75220a0b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -42,6 +42,7 @@
 /usr/share/perl5/PVE/HA/Usage.pm
 /usr/share/perl5/PVE/HA/Usage/Basic.pm
 /usr/share/perl5/PVE/HA/Usage/Static.pm
+/usr/share/perl5/PVE/HA/Usage/Dynamic.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
 /usr/share/pve-manager/templates/default/fencing-body.html.hbs
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 3643292e..44c26854 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -312,12 +312,24 @@ sub get_static_service_stats {
     return $self->{plug}->get_static_service_stats();
 }
 
+sub get_dynamic_service_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_service_stats();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
     return $self->{plug}->get_static_node_stats();
 }
 
+sub get_dynamic_node_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_dynamic_node_stats();
+}
+
 sub get_node_version {
     my ($self, $node) = @_;
 
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 5b2715c7..c60ab595 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -21,6 +21,12 @@ eval {
     $have_static_scheduling = 1;
 };
 
+my $have_dynamic_scheduling;
+eval {
+    require PVE::HA::Usage::Dynamic;
+    $have_dynamic_scheduling = 1;
+};
+
 ## Variable Name & Abbreviations Convention
 #
 # The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -267,6 +273,21 @@ sub recompute_online_node_usage {
                 'warning',
                 "fallback to 'basic' scheduler mode, init for 'static' failed - $@",
             ) if $@;
+        } elsif ($mode eq 'dynamic') {
+            if ($have_dynamic_scheduling) {
+                $online_node_usage = eval {
+                    $service_stats = $haenv->get_dynamic_service_stats();
+                    my $scheduler = PVE::HA::Usage::Dynamic->new($haenv, $service_stats);
+                    $scheduler->add_node($_) for $online_nodes->@*;
+                    return $scheduler;
+                };
+            } else {
+                $@ = "dynamic scheduling not available\n";
+            }
+            $haenv->log(
+                'warning',
+                "fallback to 'basic' scheduler mode, init for 'dynamic' failed - $@",
+            ) if $@;
         } elsif ($mode eq 'basic') {
             # handled below in the general fall-back case
         } else {
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
new file mode 100644
index 00000000..24c85a41
--- /dev/null
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -0,0 +1,122 @@
+package PVE::HA::Usage::Dynamic;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Dynamic;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv, $service_stats) = @_;
+
+    my $node_stats = eval { $haenv->get_dynamic_node_stats() };
+    die "did not get dynamic node usage information - $@" if $@;
+
+    my $scheduler = eval { PVE::RS::ResourceScheduling::Dynamic->new() };
+    die "unable to initialize dynamic scheduling - $@" if $@;
+
+    return bless {
+        'node-stats' => $node_stats,
+        'service-stats' => $service_stats,
+        haenv => $haenv,
+        scheduler => $scheduler,
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    my $stats = $self->{'node-stats'}->{$nodename}
+        or die "did not get dynamic node usage information for '$nodename'\n";
+    die "dynamic node usage information for '$nodename' missing cpu count\n"
+        if !defined($stats->{maxcpu});
+    die "dynamic node usage information for '$nodename' missing memory\n"
+        if !defined($stats->{maxmem});
+
+    eval { $self->{scheduler}->add_node($nodename, $stats); };
+    die "initializing dynamic node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+    my ($self, $sid) = @_;
+
+    my $service_stats = $self->{'service-stats'}->{$sid}->{usage}
+        or die "did not get dynamic service usage information for '$sid'\n";
+
+    return $service_stats;
+}
+
+sub add_service {
+    my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+    # do not add service, which does not put any usage on the nodes
+    return if !defined($current_node) && !defined($target_node);
+
+    # PVE::RS::ResourceScheduling::Dynamic::add_resource() expects $current_node
+    # to be set, so consider $target_node as $current_node for unset $current_node;
+    #
+    # currently, this happens for the request_start_balance service state and if
+    # node maintenance causes services to migrate to other nodes
+    if (!defined($current_node)) {
+        $current_node = $target_node;
+        undef $target_node;
+    }
+
+    eval {
+        my $service_usage = get_service_usage($self, $sid);
+
+        my $service = {
+            stats => $service_usage,
+            running => $running,
+            'current-node' => $current_node,
+            'target-node' => $target_node,
+        };
+
+        $self->{scheduler}->add_resource($sid, $service);
+    };
+    $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
+}
+
+sub remove_service_usage {
+    my ($self, $sid) = @_;
+
+    eval { $self->{scheduler}->remove_resource($sid) };
+    $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid) = @_;
+
+    my $score_list = eval {
+        my $service_usage = get_service_usage($self, $sid);
+        $self->{scheduler}->score_nodes_to_start_resource($service_usage);
+    };
+    $self->{haenv}
+        ->log('err', "unable to score nodes according to dynamic usage for service '$sid' - $@")
+        if $@;
+
+    # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+    return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index befdda60..5d51a9c1 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,5 +1,5 @@
 SIM_SOURCES=Basic.pm
-SOURCES=${SIM_SOURCES} Static.pm
+SOURCES=${SIM_SOURCES} Static.pm Dynamic.pm
 
 .PHONY: install
 install:
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (14 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion Daniel Kral
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the scheduler using the
dynamic usage information of the HA resources with rebalance-on-start
being cleared and set respectively.

As the mechanisms for the scheduler with static and dynamic usage
information are mostly the same, these test cases verify only the
essential parts, which are:

- dynamic usage information is used correctly (for both test cases), and
- repeatedly scheduling resources with score_nodes_to_start_service(...)
  correctly simulates that the previously scheduled HA resources are
  already started

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/test/test-crs-dynamic-rebalance1/README   |  3 +
 src/test/test-crs-dynamic-rebalance1/cmdlist  |  4 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  7 ++
 .../hardware_status                           |  5 ++
 .../test-crs-dynamic-rebalance1/log.expect    | 82 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  7 ++
 .../static_service_stats                      |  7 ++
 src/test/test-crs-dynamic1/README             |  4 +
 src/test/test-crs-dynamic1/cmdlist            |  4 +
 src/test/test-crs-dynamic1/datacenter.cfg     |  6 ++
 .../test-crs-dynamic1/dynamic_service_stats   |  3 +
 src/test/test-crs-dynamic1/hardware_status    |  5 ++
 src/test/test-crs-dynamic1/log.expect         | 51 ++++++++++++
 src/test/test-crs-dynamic1/manager_status     |  1 +
 src/test/test-crs-dynamic1/service_config     |  3 +
 .../test-crs-dynamic1/static_service_stats    |  3 +
 18 files changed, 203 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic1/README
 create mode 100644 src/test/test-crs-dynamic1/cmdlist
 create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic1/hardware_status
 create mode 100644 src/test/test-crs-dynamic1/log.expect
 create mode 100644 src/test/test-crs-dynamic1/manager_status
 create mode 100644 src/test/test-crs-dynamic1/service_config
 create mode 100644 src/test/test-crs-dynamic1/static_service_stats

diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README
new file mode 100644
index 00000000..df0ba0a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/README
@@ -0,0 +1,3 @@
+Test rebalancing on start and how after a failed node the recovery gets
+balanced out for a small batch of HA resources with the dynamic usage
+information.
diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..0f76d24e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-rebalance-on-start": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..5ef75ae0
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "cpu": 1.3, "mem": 1073741824 },
+    "vm:102": { "cpu": 5.6, "mem": 3221225472 },
+    "vm:103": { "cpu": 0.5, "mem": 4000000000 },
+    "vm:104": { "cpu": 7.9, "mem": 2147483648 },
+    "vm:105": { "cpu": 3.2, "mem": 2684354560 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status
new file mode 100644
index 00000000..bfdbbf7b
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect
new file mode 100644
index 00000000..5c8b050c
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/log.expect
@@ -0,0 +1,82 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: service vm:101: re-balance selected new node node1 for startup
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node1)
+info     20    node1/crm: service vm:102: re-balance selected new node node2 for startup
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node2)
+info     20    node1/crm: service vm:103: re-balance selected current node node3 for startup
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service vm:104: re-balance selected new node node1 for startup
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node1)
+info     20    node1/crm: service vm:105: re-balance selected new node node2 for startup
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance'  (node = node3, target = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:101 - start relocate to node 'node1'
+info     25    node3/lrm: service vm:101 - end relocate to node 'node1'
+info     25    node3/lrm: service vm:102 - start relocate to node 'node2'
+info     25    node3/lrm: service vm:102 - end relocate to node 'node2'
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info     25    node3/lrm: service vm:104 - start relocate to node 'node1'
+info     25    node3/lrm: service vm:104 - end relocate to node 'node1'
+info     25    node3/lrm: service vm:105 - start relocate to node 'node2'
+info     25    node3/lrm: service vm:105 - end relocate to node 'node2'
+info     40    node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started'  (node = node2)
+info     40    node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started'  (node = node1)
+info     40    node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started'  (node = node2)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     41    node1/lrm: starting service vm:104
+info     41    node1/lrm: service status vm:104 started
+info     43    node2/lrm: starting service vm:102
+info     43    node2/lrm: service status vm:102 started
+info     43    node2/lrm: starting service vm:105
+info     43    node2/lrm: service status vm:105 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:103
+info    241    node1/lrm: service status vm:103 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config
new file mode 100644
index 00000000..3071f480
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/service_config
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats
new file mode 100644
index 00000000..a9e810d7
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats
@@ -0,0 +1,7 @@
+{
+    "vm:101": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:103": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:104": { "maxcpu": 8, "maxmem": 4294967296 },
+    "vm:105": { "maxcpu": 8, "maxmem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README
new file mode 100644
index 00000000..e6382130
--- /dev/null
+++ b/src/test/test-crs-dynamic1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with dynamic usage information.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist
new file mode 100644
index 00000000..8684073c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg
new file mode 100644
index 00000000..6a7fbc48
--- /dev/null
+++ b/src/test/test-crs-dynamic1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+    "crs": {
+        "ha": "dynamic"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats
new file mode 100644
index 00000000..922ae9a6
--- /dev/null
+++ b/src/test/test-crs-dynamic1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "cpu": 5.9, "mem": 2744123392 }
+}
diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status
new file mode 100644
index 00000000..bbe44a96
--- /dev/null
+++ b/src/test/test-crs-dynamic1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 }
+}
diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect
new file mode 100644
index 00000000..b7e298e1
--- /dev/null
+++ b/src/test/test-crs-dynamic1/log.expect
@@ -0,0 +1,51 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute network node1 off
+info    120    node1/crm: status change master => lost_manager_lock
+info    120    node1/crm: status change lost_manager_lock => wait_for_quorum
+info    121    node1/lrm: status change active => lost_agent_lock
+info    162     watchdog: execute power node1 off
+info    161    node1/crm: killed by poweroff
+info    162    node1/lrm: killed by poweroff
+info    162     hardware: server 'node1' stopped by poweroff (watchdog)
+info    222    node3/crm: got lock 'ha_manager_lock'
+info    222    node3/crm: status change slave => master
+info    222    node3/crm: using scheduler mode 'dynamic'
+info    222    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info    282    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    282    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai    282    node3/crm: FENCE: Try to fence node 'node1'
+info    282    node3/crm: got lock 'ha_agent_node1_lock'
+info    282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai    282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info    282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    283    node3/lrm: got lock 'ha_agent_node3_lock'
+info    283    node3/lrm: status change wait_for_agent_lock => active
+info    283    node3/lrm: starting service vm:102
+info    283    node3/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config
new file mode 100644
index 00000000..9c124471
--- /dev/null
+++ b/src/test/test-crs-dynamic1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats
new file mode 100644
index 00000000..1819d24c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (15 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

The name is misleading, because the HA resource migration is not
executed, but only queues the HA resource to change into the state
'migrate' or 'relocate', which is then picked up by the respective LRM
to execute.

The term 'resource motion' also generalizes the different actions
implied by the 'migrate' and 'relocate' command and state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Manager.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c60ab595..c8a1a35b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -419,7 +419,7 @@ sub read_lrm_status {
     return ($results, $modes);
 }
 
-sub execute_migration {
+sub queue_resource_motion {
     my ($self, $cmd, $task, $sid, $target) = @_;
 
     my ($haenv, $ss) = $self->@{qw(haenv ss)};
@@ -488,7 +488,7 @@ sub update_crm_commands {
                             "ignore crm command - service already on target node: $cmd",
                         );
                     } else {
-                        $self->execute_migration($cmd, $task, $sid, $node);
+                        $self->queue_resource_motion($cmd, $task, $sid, $node);
                     }
                 }
             } else {
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (16 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/PVE/HA/Manager.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c8a1a35b..2576c762 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -94,11 +94,12 @@ sub update_crs_scheduler_mode {
 
     my $haenv = $self->{haenv};
     my $dc_cfg = $haenv->get_datacenter_settings();
+    my $crs_cfg = $dc_cfg->{crs};
 
-    $self->{crs}->{rebalance_on_request_start} = !!$dc_cfg->{crs}->{'ha-rebalance-on-start'};
+    $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
 
     my $old_mode = $self->{crs}->{scheduler};
-    my $new_mode = $dc_cfg->{crs}->{ha} || 'basic';
+    my $new_mode = $crs_cfg->{ha} || 'basic';
 
     if (!defined($old_mode)) {
         $haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 19/28] implement automatic rebalancing
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (17 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:14   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases Daniel Kral
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

If the automatic load balancing system is enabled, it checks whether the
cluster node imbalance exceeds some user-defined threshold for some HA
Manager rounds ("hold duration"). If it does exceed on consecutive HA
Manager rounds, it will choose the best resource motion to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined imbalance improvement ("margin").

This patch introduces resource bundles, which ensure that HA resources
in strict positive resource affinity rules are considered as a whole
"bundle" instead of individual HA resources.

Specifically, active and stationary resource bundles are resource
bundles, that have at least one resource running and all resources
located on the same node. This distinction is needed as newly created
strict positive resource affinity rules may still require some resource
motions to enforce the rule.

Additionally, the migration candidate generation prunes any target
nodes, which do not adhere to the HA rules of these resource bundles
before scoring these migration candidates.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- change imbalance threshold default value from 0.7 to 0.3
- use sprintf() for float number printing instead of perly rounding
  logic
- print before and expected after values
- implement PVE::HA::Usage::Basic rebalancing methods as well with
  sensible return values, but which are only used to not throw errors if
  a failback from 'dynamic'/'static' to 'basic' happens in
  recompute_online_node_usage()

 src/PVE/HA/Manager.pm       | 178 +++++++++++++++++++++++++++++++++++-
 src/PVE/HA/Usage.pm         |  34 +++++++
 src/PVE/HA/Usage/Basic.pm   |  18 ++++
 src/PVE/HA/Usage/Dynamic.pm |  33 +++++++
 src/PVE/HA/Usage/Static.pm  |  33 +++++++
 5 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 2576c762..b69a6bba 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -59,10 +59,17 @@ sub new {
 
     my $self = bless {
         haenv => $haenv,
-        crs => {},
+        crs => {
+            auto_rebalance => {},
+        },
         last_rules_digest => '',
         last_groups_digest => '',
         last_services_digest => '',
+        # used to track how many HA rounds the imbalance threshold has been exceeded
+        #
+        # this is not persisted for a CRM failover as in the mean time
+        # the usage statistics might have change quite a bit already
+        sustained_imbalance_round => 0,
         group_migration_round => 3, # wait a little bit
     }, $class;
 
@@ -97,6 +104,13 @@ sub update_crs_scheduler_mode {
     my $crs_cfg = $dc_cfg->{crs};
 
     $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
+    $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
+    $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.3;
+    $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
+        // 'bruteforce';
+    $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
+        // 3;
+    $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
 
     my $old_mode = $self->{crs}->{scheduler};
     my $new_mode = $crs_cfg->{ha} || 'basic';
@@ -114,6 +128,149 @@ sub update_crs_scheduler_mode {
     return;
 }
 
+# Returns a hash of lists, which contain the running, non-moving HA resource
+# bundles, which are on the same node, implied by the strict positive resource
+# affinity rules.
+#
+# Each resource bundle has a leader, which is the alphabetically first running
+# HA resource in the resource bundle and also the key of each resource bundle
+# in the returned hash.
+sub get_active_stationary_resource_bundles {
+    my ($ss, $resource_affinity) = @_;
+
+    my $resource_bundles = {};
+OUTER: for my $sid (sort keys %$ss) {
+        # do not consider non-started resource as 'active' leading resource
+        next if $ss->{$sid}->{state} ne 'started';
+
+        my @resources = ($sid);
+        my $nodes = { $ss->{$sid}->{node} => 1 };
+
+        my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
+        if (%$dependent_resources) {
+            for my $csid (keys %$dependent_resources) {
+                next if !defined($ss->{$csid});
+                my ($state, $node) = $ss->{$csid}->@{qw(state node)};
+
+                # do not consider stationary bundle if a dependent resource moves
+                next OUTER if $state eq 'migrate' || $state eq 'relocate';
+                # do not add non-started resource to active bundle
+                next if $state ne 'started';
+
+                $nodes->{$node} = 1;
+
+                push @resources, $csid;
+            }
+
+            @resources = sort @resources;
+        }
+
+        # skip resource bundles, which are not on the same node yet
+        next if keys %$nodes > 1;
+
+        my $leader_sid = $resources[0];
+
+        $resource_bundles->{$leader_sid} = \@resources;
+    }
+
+    return $resource_bundles;
+}
+
+# Returns a hash of hashes, where each item contains the resource bundle's
+# leader, the list of HA resources in the resource bundle, and the list of
+# possible nodes to migrate to.
+sub get_resource_migration_candidates {
+    my ($self) = @_;
+
+    my ($ss, $compiled_rules, $online_node_usage) =
+        $self->@{qw(ss compiled_rules online_node_usage)};
+    my ($node_affinity, $resource_affinity) =
+        $compiled_rules->@{qw(node-affinity resource-affinity)};
+
+    my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    my @compact_migration_candidates = ();
+    for my $leader_sid (sort keys %$resource_bundles) {
+        my $current_leader_node = $ss->{$leader_sid}->{node};
+        my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
+
+        my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+        my ($together, $separate) =
+            get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
+        apply_negative_resource_affinity($separate, $target_nodes);
+
+        delete $target_nodes->{$current_leader_node};
+
+        next if !%$target_nodes;
+
+        push @compact_migration_candidates,
+            {
+                leader => $leader_sid,
+                nodes => [sort keys %$target_nodes],
+                resources => $resource_bundles->{$leader_sid},
+            };
+    }
+
+    return \@compact_migration_candidates;
+}
+
+sub load_balance {
+    my ($self) = @_;
+
+    my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
+    my ($auto_rebalance_opts) = $crs->{auto_rebalance};
+
+    return if !$auto_rebalance_opts->{enable};
+    return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
+    return if $self->any_resource_motion_queued_or_running();
+
+    my ($threshold, $method, $hold_duration, $margin) =
+        $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
+
+    my $imbalance = $online_node_usage->calculate_node_imbalance();
+
+    # do not load balance unless imbalance threshold has been exceeded
+    # consecutively for $hold_duration calls to load_balance();
+    # the <= relation prevents load balancing from triggering for $imbalance = 0.0
+    if ($imbalance <= $threshold) {
+        $self->{sustained_imbalance_round} = 0;
+        return;
+    } else {
+        $self->{sustained_imbalance_round}++;
+        return if $self->{sustained_imbalance_round} < $hold_duration;
+        $self->{sustained_imbalance_round} = 0;
+    }
+
+    my $candidates = $self->get_resource_migration_candidates();
+
+    my $result;
+    if ($method eq 'bruteforce') {
+        $result = $online_node_usage->select_best_balancing_migration($candidates);
+    } elsif ($method eq 'topsis') {
+        $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
+    }
+
+    # happens if $candidates is empty or $method isn't handled above
+    return if !$result;
+
+    my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
+
+    my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
+    return if $relative_change < $margin;
+
+    my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
+
+    my (undef, $type, $id) = $haenv->parse_sid($sid);
+    my $task = $type eq 'vm' ? "migrate" : "relocate";
+    my $cmd = "$task $sid $target";
+
+    my $imbalance_change_str =
+        sprintf("expected change for imbalance from %.2f to %.2f", $imbalance, $target_imbalance);
+    $haenv->log('info', "auto rebalance - $task $sid to $target ($imbalance_change_str)");
+
+    $self->queue_resource_motion($cmd, $task, $sid, $target);
+}
+
 sub cleanup {
     my ($self) = @_;
 
@@ -466,6 +623,21 @@ sub queue_resource_motion {
     }
 }
 
+sub any_resource_motion_queued_or_running {
+    my ($self) = @_;
+
+    my ($ss) = $self->@{qw(ss)};
+
+    for my $sid (keys %$ss) {
+        my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
+
+        return 1 if $state eq 'migrate' || $state eq 'relocate';
+        return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
+    }
+
+    return 0;
+}
+
 # read new crm commands and save them into crm master status
 sub update_crm_commands {
     my ($self) = @_;
@@ -902,6 +1074,10 @@ sub manage {
             return; # disarm active and progressing, skip normal service state machine
         }
         # disarm deferred - fall through but only process services in transient states
+    } else {
+        # load balance only if disarm is disabled as during a deferred disarm
+        # the HA Manager should not introduce any new migrations
+        $self->load_balance();
     }
 
     $self->{all_lrms_disarmed} = 0;
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 43feb041..659ab30a 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -60,6 +60,40 @@ sub remove_service_usage {
     die "implement in subclass";
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    die "implement in subclass";
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    die "implement in subclass";
+}
+
+sub select_best_balancing_migration_topsis {
+    my ($self, $migration_candidates) = @_;
+
+    my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
+
+    return $migrations->[0];
+}
+
 # Returns a hash with $nodename => $score pairs. A lower $score is better.
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 5aa3ac05..4dce9e17 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -66,6 +66,24 @@ sub remove_service_usage {
     }
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    return 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    return [];
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    return [];
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
index 24c85a41..76d0feaa 100644
--- a/src/PVE/HA/Usage/Dynamic.pm
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -104,6 +104,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate dynamic node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 8c7a614b..e67d5f5b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -111,6 +111,39 @@ sub remove_service_usage {
     $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
 }
 
+sub calculate_node_imbalance {
+    my ($self) = @_;
+
+    my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+    $self->{haenv}->log('warning', "unable to calculate static node imbalance - $@") if $@;
+
+    return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+    my ($self, $migration_candidates, $limit) = @_;
+
+    my $migrations = eval {
+        $self->{scheduler}
+            ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+    };
+    $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+    return $migrations;
+}
+
 sub score_nodes_to_start_service {
     my ($self, $sid) = @_;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH ha-manager v4 19/28] implement automatic rebalancing
  2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
@ 2026-04-02 13:14   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:14 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> If the automatic load balancing system is enabled, it checks whether the
> cluster node imbalance exceeds some user-defined threshold for some HA
> Manager rounds ("hold duration"). If it does exceed on consecutive HA
> Manager rounds, it will choose the best resource motion to improve the
> cluster node imbalance and queue it if it significantly improves it by
> some user-defined imbalance improvement ("margin").
>
> This patch introduces resource bundles, which ensure that HA resources
> in strict positive resource affinity rules are considered as a whole
> "bundle" instead of individual HA resources.
>
> Specifically, active and stationary resource bundles are resource
> bundles, that have at least one resource running and all resources
> located on the same node. This distinction is needed as newly created
> strict positive resource affinity rules may still require some resource
> motions to enforce the rule.
>
> Additionally, the migration candidate generation prunes any target
> nodes, which do not adhere to the HA rules of these resource bundles
> before scoring these migration candidates.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - change imbalance threshold default value from 0.7 to 0.3
> - use sprintf() for float number printing instead of perly rounding
>   logic
> - print before and expected after values
> - implement PVE::HA::Usage::Basic rebalancing methods as well with
>   sensible return values, but which are only used to not throw errors if
>   a failback from 'dynamic'/'static' to 'basic' happens in
>   recompute_online_node_usage()
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (18 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases document which resource bundles count as active and
stationary and ensure that get_active_stationary_resource_bundles(...)
does produce the correct active, stationary resource bundles.

This is especially important, because these resource bundles are used
for the load balancing candidate generation, which is passed to
score_best_balancing_migration_candidates($candidates, ...). The
PVE::HA::Usage::{Static,Dynamic} implementation validates these
candidates and fails with an user-visible error message.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 src/test/Makefile                 |   1 +
 src/test/test_resource_bundles.pl | 234 ++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+)
 create mode 100755 src/test/test_resource_bundles.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index 6da9e100..f72b755b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -6,6 +6,7 @@ test:
 	@echo "-- start regression tests --"
 	./test_failover1.pl
 	./test_rules_config.pl
+	./test_resource_bundles.pl
 	./ha-tester.pl
 	./test_fence_config.pl
 	@echo "-- end regression tests (success) --"
diff --git a/src/test/test_resource_bundles.pl b/src/test/test_resource_bundles.pl
new file mode 100755
index 00000000..d38dc516
--- /dev/null
+++ b/src/test/test_resource_bundles.pl
@@ -0,0 +1,234 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use Test::More;
+
+use PVE::HA::Manager;
+
+my $get_active_stationary_resource_bundle_tests = [
+    {
+        description => "trivial resource bundles",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {},
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101',
+            ],
+            'vm:102' => [
+                'vm:102',
+            ],
+        },
+    },
+    {
+        description => "simple resource bundle",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:102',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with first resource stopped",
+        services => {
+            'vm:101' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:102' => [
+                'vm:102', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with some stopped resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'stopped',
+                node => 'node1',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {
+            'vm:101' => [
+                'vm:101', 'vm:103',
+            ],
+        },
+    },
+    {
+        description => "resource bundle with moving resources",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'migrate',
+                node => 'node2',
+                target => 'node1',
+            },
+            'vm:103' => {
+                state => 'relocate',
+                node => 'node3',
+                target => 'node1',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+    # might happen if the resource bundle is generated even before the HA Manager
+    # puts the HA resources in migrate/relocate to make them adhere to the HA rules
+    {
+        description => "resource bundle with resources on different nodes",
+        services => {
+            'vm:101' => {
+                state => 'started',
+                node => 'node1',
+            },
+            'vm:102' => {
+                state => 'started',
+                node => 'node2',
+            },
+            'vm:103' => {
+                state => 'started',
+                node => 'node3',
+            },
+        },
+        resource_affinity => {
+            positive => {
+                'vm:101' => {
+                    'vm:102' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:102' => {
+                    'vm:101' => 1,
+                    'vm:103' => 1,
+                },
+                'vm:103' => {
+                    'vm:101' => 1,
+                    'vm:102' => 1,
+                },
+            },
+            negative => {},
+        },
+        resource_bundles => {},
+    },
+];
+
+my $tests = [
+    @$get_active_stationary_resource_bundle_tests,
+];
+
+plan(tests => scalar($tests->@*));
+
+for my $case ($get_active_stationary_resource_bundle_tests->@*) {
+    my ($ss, $resource_affinity) = $case->@{qw(services resource_affinity)};
+
+    my $result = PVE::HA::Manager::get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+    is_deeply($result, $case->{resource_bundles}, $case->{description});
+}
+
+done_testing();
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system test cases
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (19 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:21   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases document the basic behavior of the automatic load
rebalancer using the dynamic usage stats.

As an overview:

- Case 0: rebalancing system is inactive for no configured HA resources
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through dynamic
          changes in their usage
- Case 4: rebalancing system doesn't trigger a migration if the node
          imbalance is exceeded once but isn't sustained for at least
          the set hold duration

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 was changed to have an initial imbalance below 0.3 instead of
  an imbalance below 0.7 as before
- case 4 was changed to have an imbalance below 0.3 instead of below 0.7
  to not trigger a rebalancing migration inbetween the transient spike

 .../test-crs-dynamic-auto-rebalance0/README   |  2 +
 .../test-crs-dynamic-auto-rebalance0/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../test-crs-dynamic-auto-rebalance1/README   |  7 ++
 .../test-crs-dynamic-auto-rebalance1/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-dynamic-auto-rebalance2/README   |  4 +
 .../test-crs-dynamic-auto-rebalance2/cmdlist  |  3 +
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-dynamic-auto-rebalance3/README   |  4 +
 .../test-crs-dynamic-auto-rebalance3/cmdlist  | 16 ++++
 .../datacenter.cfg                            |  7 ++
 .../dynamic_service_stats                     |  9 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 89 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 ++
 .../static_service_stats                      |  9 ++
 .../test-crs-dynamic-auto-rebalance4/README   | 11 +++
 .../test-crs-dynamic-auto-rebalance4/cmdlist  | 13 +++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 ++
 .../static_service_stats                      |  9 ++
 45 files changed, 459 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance0/README b/src/test/test-crs-dynamic-auto-rebalance0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/log.expect b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/manager_status b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/service_config b/src/test/test-crs-dynamic-auto-rebalance0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/README b/src/test/test-crs-dynamic-auto-rebalance1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/service_config b/src/test/test-crs-dynamic-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/README b/src/test/test-crs-dynamic-auto-rebalance2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
new file mode 100644
index 00000000..3d79026e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/service_config b/src/test/test-crs-dynamic-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/README b/src/test/test-crs-dynamic-auto-rebalance3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 805306368 },
+    "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+    "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
new file mode 100644
index 00000000..275f7aec
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
@@ -0,0 +1,89 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected change for imbalance from 0.85 to 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    240    node1/crm: auto rebalance - migrate vm:103 to node1 (expected change for imbalance from 0.81 to 0.40)
+info    240    node1/crm: got crm command: migrate vm:103 node1
+info    240    node1/crm: migrate service 'vm:103' to node 'node1'
+info    240    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    243    node2/lrm: service vm:103 - start migrate to node 'node1'
+info    243    node2/lrm: service vm:103 - end migrate to node 'node1'
+info    260    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    261    node1/lrm: starting service vm:103
+info    261    node1/lrm: service status vm:103 started
+info    320    node1/crm: auto rebalance - migrate vm:105 to node3 (expected change for imbalance from 0.40 to 0.21)
+info    320    node1/crm: got crm command: migrate vm:105 node3
+info    320    node1/crm: migrate service 'vm:105' to node 'node3'
+info    320    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    323    node2/lrm: service vm:105 - start migrate to node 'node3'
+info    323    node2/lrm: service vm:105 - end migrate to node 'node3'
+info    340    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node3)
+info    345    node3/lrm: starting service vm:105
+info    345    node3/lrm: service status vm:105 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/service_config b/src/test/test-crs-dynamic-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/README b/src/test/test-crs-dynamic-auto-rebalance4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..0b1d7625
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 768",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 1538",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 1538"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..14059a3e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 805306368 },
+    "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+    "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
new file mode 100644
index 00000000..30898f18
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 768
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 1538
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 1538
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/service_config b/src/test/test-crs-dynamic-auto-rebalance4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system test cases
  2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-04-02 13:21   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:21 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases document the basic behavior of the automatic load
> rebalancer using the dynamic usage stats.
>
> As an overview:
>
> - Case 0: rebalancing system is inactive for no configured HA resources
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
>           for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance and converge if
>           the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance through dynamic
>           changes in their usage
> - Case 4: rebalancing system doesn't trigger a migration if the node
>           imbalance is exceeded once but isn't sustained for at least
>           the set hold duration
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 was changed to have an initial imbalance below 0.3 instead of
>   an imbalance below 0.7 as before
> - case 4 was changed to have an imbalance below 0.3 instead of below 0.7
>   to not trigger a rebalancing migration inbetween the transient spike
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 22/28] test: add static automatic rebalancing system test cases
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (20 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:23   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases are derivatives of the dynamic automatic rebalancing
system test cases 1 to 3, which ensure that the same basic functionality
is provided with the automatic rebalancing system with static usage
information.

The other dynamic usage test cases are not included here, because these
are invariant to the provided usage information and only test further
edge cases.

As an overview:

- Case 1: rebalancing system doesn't trigger any rebalancing migrations
          for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance and converge if
          the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
          resources cause a significant node imbalance through changes
          in their static usage

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 is a little more thorough now as it now uses 3 rebalancing
  migrations to reach the minimum possible node imbalance

 .../test-crs-static-auto-rebalance1/README    |  7 ++
 .../test-crs-static-auto-rebalance1/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 +
 .../log.expect                                | 25 +++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../test-crs-static-auto-rebalance2/README    |  4 +
 .../test-crs-static-auto-rebalance2/cmdlist   |  3 +
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 +
 .../log.expect                                | 59 +++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../test-crs-static-auto-rebalance3/README    |  3 +
 .../test-crs-static-auto-rebalance3/cmdlist   | 15 +++
 .../datacenter.cfg                            |  7 ++
 .../hardware_status                           |  5 +
 .../log.expect                                | 97 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 ++
 .../static_service_stats                      |  9 ++
 24 files changed, 291 insertions(+)
 create mode 100644 src/test/test-crs-static-auto-rebalance1/README
 create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance2/README
 create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-static-auto-rebalance3/README
 create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats

diff --git a/src/test/test-crs-static-auto-rebalance1/README b/src/test/test-crs-static-auto-rebalance1/README
new file mode 100644
index 00000000..8f97ac55
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with static usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-static-auto-rebalance1/cmdlist b/src/test/test-crs-static-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance1/datacenter.cfg b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance1/hardware_status b/src/test/test-crs-static-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/log.expect b/src/test/test-crs-static-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d2c27bec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance1/manager_status b/src/test/test-crs-static-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance1/service_config b/src/test/test-crs-static-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/static_service_stats b/src/test/test-crs-static-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/README b/src/test/test-crs-static-auto-rebalance2/README
new file mode 100644
index 00000000..1d1b9d6e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-static-auto-rebalance2/cmdlist b/src/test/test-crs-static-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance2/datacenter.cfg b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance2/hardware_status b/src/test/test-crs-static-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/log.expect b/src/test/test-crs-static-auto-rebalance2/log.expect
new file mode 100644
index 00000000..6a2ab89f
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance2/manager_status b/src/test/test-crs-static-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-auto-rebalance2/service_config b/src/test/test-crs-static-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/static_service_stats b/src/test/test-crs-static-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/README b/src/test/test-crs-static-auto-rebalance3/README
new file mode 100644
index 00000000..2f57dac2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/README
@@ -0,0 +1,3 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running HA resources, where the static usage stats of some
+HA resources change over time, to reach minimum cluster node imbalance.
diff --git a/src/test/test-crs-static-auto-rebalance3/cmdlist b/src/test/test-crs-static-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..f18798b0
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/cmdlist
@@ -0,0 +1,15 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:106 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:107 set-static-stats maxcpu 8.0 maxmem 8192"
+    ],
+    [
+        "service vm:101 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:102 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:103 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:104 set-static-stats maxcpu 1.0 maxmem 1024",
+        "service vm:105 set-static-stats maxcpu 1.0 maxmem 1024"
+    ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance3/datacenter.cfg b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance3/hardware_status b/src/test/test-crs-static-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/log.expect b/src/test/test-crs-static-auto-rebalance3/log.expect
new file mode 100644
index 00000000..ecf2d183
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/log.expect
@@ -0,0 +1,97 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:106 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:107 set-static-stats maxcpu 8.0 maxmem 8192
+info    160    node1/crm: auto rebalance - migrate vm:105 to node1 (expected change for imbalance from 0.88 to 0.47)
+info    160    node1/crm: got crm command: migrate vm:105 node1
+info    160    node1/crm: migrate service 'vm:105' to node 'node1'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node1'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node1'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node1)
+info    181    node1/lrm: starting service vm:105
+info    181    node1/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:102 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:103 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:104 set-static-stats maxcpu 1.0 maxmem 1024
+info    220      cmdlist: execute service vm:105 set-static-stats maxcpu 1.0 maxmem 1024
+info    240    node1/crm: auto rebalance - migrate vm:106 to node2 (expected change for imbalance from 0.91 to 0.42)
+info    240    node1/crm: got crm command: migrate vm:106 node2
+info    240    node1/crm: migrate service 'vm:106' to node 'node2'
+info    240    node1/crm: service 'vm:106': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    245    node3/lrm: service vm:106 - start migrate to node 'node2'
+info    245    node3/lrm: service vm:106 - end migrate to node 'node2'
+info    260    node1/crm: service 'vm:106': state changed from 'migrate' to 'started'  (node = node2)
+info    263    node2/lrm: starting service vm:106
+info    263    node2/lrm: service status vm:106 started
+info    320    node1/crm: auto rebalance - migrate vm:103 to node1 (expected change for imbalance from 0.42 to 0.31)
+info    320    node1/crm: got crm command: migrate vm:103 node1
+info    320    node1/crm: migrate service 'vm:103' to node 'node1'
+info    320    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    323    node2/lrm: service vm:103 - start migrate to node 'node1'
+info    323    node2/lrm: service vm:103 - end migrate to node 'node1'
+info    340    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node1)
+info    341    node1/lrm: starting service vm:103
+info    341    node1/lrm: service status vm:103 started
+info    400    node1/crm: auto rebalance - migrate vm:104 to node1 (expected change for imbalance from 0.31 to 0.20)
+info    400    node1/crm: got crm command: migrate vm:104 node1
+info    400    node1/crm: migrate service 'vm:104' to node 'node1'
+info    400    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    403    node2/lrm: service vm:104 - start migrate to node 'node1'
+info    403    node2/lrm: service vm:104 - end migrate to node 'node1'
+info    420    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node1)
+info    421    node1/lrm: starting service vm:104
+info    421    node1/lrm: service status vm:104 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance3/manager_status b/src/test/test-crs-static-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance3/service_config b/src/test/test-crs-static-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/static_service_stats b/src/test/test-crs-static-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..560a6fe8
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:105": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:106": { "maxcpu": 2.0, "maxmem": 2147483648 },
+    "vm:107": { "maxcpu": 2.0, "maxmem": 2147483648 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH ha-manager v4 22/28] test: add static automatic rebalancing system test cases
  2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
@ 2026-04-02 13:23   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:23 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases are derivatives of the dynamic automatic rebalancing
> system test cases 1 to 3, which ensure that the same basic functionality
> is provided with the automatic rebalancing system with static usage
> information.
>
> The other dynamic usage test cases are not included here, because these
> are invariant to the provided usage information and only test further
> edge cases.
>
> As an overview:
>
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
>           for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance and converge if
>           the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
>           resources cause a significant node imbalance through changes
>           in their static usage
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 is a little more thorough now as it now uses 3 rebalancing
>   migrations to reach the minimum possible node imbalance
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (21 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:29   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases are clones of the dynamic automatic rebalancing system
test cases 0 through 4, which ensure that the same basic functionality
is provided with the automatic rebalancing system using the TOPSIS
method.

The expected outputs are exactly the same, but for test case 3, which
changes the second migration from

    vm:103 to node1 with an expected target imbalance of 0.40

to

    vm:103 to node3 with an expected target imbalance of 0.43.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 was changed to have an initial imbalance below 0.3 instead of
  an imbalance below 0.7 as before
- case 4 was changed to have an imbalance below 0.3 instead of below 0.7
  to not trigger a rebalancing migration inbetween the transient spike

 .../README                                    |  2 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  1 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 11 +++
 .../manager_status                            |  1 +
 .../service_config                            |  1 +
 .../static_service_stats                      |  1 +
 .../README                                    |  7 ++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  3 +
 .../hardware_status                           |  5 ++
 .../log.expect                                | 25 ++++++
 .../manager_status                            |  1 +
 .../service_config                            |  3 +
 .../static_service_stats                      |  3 +
 .../README                                    |  4 +
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    |  4 +
 .../cmdlist                                   | 16 ++++
 .../datacenter.cfg                            |  8 ++
 .../dynamic_service_stats                     |  9 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 89 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 ++
 .../static_service_stats                      |  9 ++
 .../README                                    | 11 +++
 .../cmdlist                                   | 13 +++
 .../datacenter.cfg                            |  9 ++
 .../dynamic_service_stats                     |  9 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 ++++++++++++
 .../manager_status                            |  1 +
 .../service_config                            |  9 ++
 .../static_service_stats                      |  9 ++
 45 files changed, 464 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
 create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats

diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/README b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
@@ -0,0 +1,11 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/README b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
@@ -0,0 +1,25 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/README b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+    "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
new file mode 100644
index 00000000..3d79026e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     80    node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info     80    node1/crm: got crm command: migrate vm:101 node2
+info     80    node1/crm: migrate service 'vm:101' to node 'node2'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node2'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node2'
+info     83    node2/lrm: got lock 'ha_agent_node2_lock'
+info     83    node2/lrm: status change wait_for_agent_lock => active
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    103    node2/lrm: starting service vm:101
+info    103    node2/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info    160    node1/crm: got crm command: migrate vm:102 node3
+info    160    node1/crm: migrate service 'vm:102' to node 'node3'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    161    node1/lrm: service vm:102 - start migrate to node 'node3'
+info    161    node1/lrm: service vm:102 - end migrate to node 'node3'
+info    165    node3/lrm: got lock 'ha_agent_node3_lock'
+info    165    node3/lrm: status change wait_for_agent_lock => active
+info    180    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
+info    185    node3/lrm: starting service vm:102
+info    185    node3/lrm: service status vm:102 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/README b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
@@ -0,0 +1,16 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:101 set-dynamic-stats mem 1011",
+        "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+        "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+        "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+        "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+        "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis"
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 805306368 },
+    "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+    "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
new file mode 100644
index 00000000..c9fc29e0
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
@@ -0,0 +1,89 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    160    node1/crm: auto rebalance - migrate vm:105 to node2 (expected change for imbalance from 0.85 to 0.42)
+info    160    node1/crm: got crm command: migrate vm:105 node2
+info    160    node1/crm: migrate service 'vm:105' to node 'node2'
+info    160    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:105 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:105 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:105
+info    183    node2/lrm: service status vm:105 started
+info    220      cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info    220      cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info    220      cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info    240    node1/crm: auto rebalance - migrate vm:103 to node3 (expected change for imbalance from 0.81 to 0.43)
+info    240    node1/crm: got crm command: migrate vm:103 node3
+info    240    node1/crm: migrate service 'vm:103' to node 'node3'
+info    240    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node2, target = node3)
+info    243    node2/lrm: service vm:103 - start migrate to node 'node3'
+info    243    node2/lrm: service vm:103 - end migrate to node 'node3'
+info    260    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    265    node3/lrm: starting service vm:103
+info    265    node3/lrm: service status vm:103 started
+info    320    node1/crm: auto rebalance - migrate vm:105 to node1 (expected change for imbalance from 0.43 to 0.24)
+info    320    node1/crm: got crm command: migrate vm:105 node1
+info    320    node1/crm: migrate service 'vm:105' to node 'node1'
+info    320    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node2, target = node1)
+info    323    node2/lrm: service vm:105 - start migrate to node 'node1'
+info    323    node2/lrm: service vm:105 - end migrate to node 'node1'
+info    340    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node1)
+info    341    node1/lrm: starting service vm:105
+info    341    node1/lrm: service status vm:105 started
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/README b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
new file mode 100644
index 00000000..0b1d7625
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
@@ -0,0 +1,13 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+        "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+        "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+    ],
+    [
+        "service vm:105 set-dynamic-stats cpu 3.0 mem 768",
+        "service vm:106 set-dynamic-stats cpu 2.9 mem 1538",
+        "service vm:107 set-dynamic-stats cpu 2.1 mem 1538"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
new file mode 100644
index 00000000..0fb3fdc3
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+	"ha-auto-rebalance-method": "topsis",
+        "ha-auto-rebalance-hold-duration": 6
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+    "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+    "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:105": { "cpu": 3.0, "mem": 805306368 },
+    "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+    "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
new file mode 100644
index 00000000..30898f18
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: adding new service 'vm:105' on node 'node3'
+info     20    node1/crm: adding new service 'vm:106' on node 'node3'
+info     20    node1/crm: adding new service 'vm:107' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     25    node3/lrm: starting service vm:106
+info     25    node3/lrm: service status vm:106 started
+info     25    node3/lrm: starting service vm:107
+info     25    node3/lrm: service status vm:107 started
+info    120      cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info    120      cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info    120      cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info    220      cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 768
+info    220      cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 1538
+info    220      cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 1538
+info    820     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" },
+    "vm:105": { "node": "node3", "state": "started" },
+    "vm:106": { "node": "node3", "state": "started" },
+    "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
@@ -0,0 +1,9 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+    "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method
  2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-04-02 13:29   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:29 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases are clones of the dynamic automatic rebalancing system
> test cases 0 through 4, which ensure that the same basic functionality
> is provided with the automatic rebalancing system using the TOPSIS
> method.
>
> The expected outputs are exactly the same, but for test case 3, which
> changes the second migration from
>
>     vm:103 to node1 with an expected target imbalance of 0.40
>
> to
>
>     vm:103 to node3 with an expected target imbalance of 0.43.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 was changed to have an initial imbalance below 0.3 instead of
>   an imbalance below 0.7 as before
> - case 4 was changed to have an imbalance below 0.3 instead of below 0.7
>   to not trigger a rebalancing migration inbetween the transient spike
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (22 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

These test cases document and verify some behaviors of the automatic
rebalancing system in combination with HA affinity rules.

All of these test cases use only the dynamic usage information and
bruteforce method as the waiting on ongoing migrations and candidate
generation are invariant to those parameters.

As an overview:

- Case 1: rebalancing system acknowledges node affinity rules
- Case 2: rebalancing system considers HA resources in strict positive
          resource affinity rules as a single unit (a resource bundle)
          and will not split them apart
- Case 3: rebalancing system will wait on the migration of a not-yet
          enforced strict positive resource affinity rule, i.e., the
          HA resources still need to migrate to their common node
- Case 4: rebalancing system will acknowledge strict negative resource
          affinity rules, but will still try to minimize the node
          imbalance as much as possible

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none

 .../README                                    |  7 +++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  5 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 49 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 ++
 .../service_config                            |  5 ++
 .../static_service_stats                      |  5 ++
 .../README                                    | 12 ++++
 .../cmdlist                                   |  8 +++
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  4 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 53 +++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  3 +
 .../service_config                            |  4 ++
 .../static_service_stats                      |  4 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  8 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            | 31 ++++++++++
 .../rules_config                              |  3 +
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 .../README                                    | 14 +++++
 .../cmdlist                                   |  3 +
 .../datacenter.cfg                            |  7 +++
 .../dynamic_service_stats                     |  6 ++
 .../hardware_status                           |  5 ++
 .../log.expect                                | 59 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  7 +++
 .../service_config                            |  6 ++
 .../static_service_stats                      |  6 ++
 40 files changed, 452 insertions(+)
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
 create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats

diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/README b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
new file mode 100644
index 00000000..8504755f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information will not
+auto rebalance running HA resources, which cause a node imbalance exceeding the
+threshold, because their HA node affinity rules require them to strictly be
+kept on specific nodes.
+
+As a sanity check, the added HA resource, which is not part of the node
+affinity rule, is rebalanced to another node to lower the imbalance.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..6ee04948
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:104 add node1 started 1",
+        "service vm:104 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:104 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..02133ab0
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
new file mode 100644
index 00000000..c9267997
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
@@ -0,0 +1,49 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:104 add node1 started 1
+info    120      cmdlist: execute service vm:104 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:104 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:104' on node 'node1'
+info    120    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:104 to node2 (expected change for imbalance from 1.41 to 0.98)
+info    140    node1/crm: got crm command: migrate vm:104 node2
+info    140    node1/crm: migrate service 'vm:104' to node 'node2'
+info    140    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:104 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:104 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:104
+info    163    node2/lrm: service status vm:104 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
new file mode 100644
index 00000000..00f615e9
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-stays-on-node1
+	nodes node1
+	resources vm:101,vm:102,vm:103
+	strict 1
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
new file mode 100644
index 00000000..57e3579d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..b11cc5eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/README b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
new file mode 100644
index 00000000..be072f6d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
@@ -0,0 +1,12 @@
+Test that the auto rebalance system with dynamic usage information will
+consider running HA resources in strict positive resource affinity rules as
+bundles, which can only be moved to other nodes as a single unit.
+
+Therefore, even though the two initial HA resources would be split apart,
+because these cause a node imbalance in the cluster, the auto rebalance system
+does not issue a rebalancing migration, because they must stay together.
+
+As a sanity check, adding another HA resource, which is not part of the strict
+positive resource affinity rule, will cause a rebalancing migration: in this
+case the resource bundle itself, because the leading node 'vm:101' is
+alphabetically first.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..61373367
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
@@ -0,0 +1,8 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [
+        "service vm:103 add node1 started 1",
+        "service vm:103 set-static-stats maxcpu 8.0 maxmem 8192",
+        "service vm:103 set-dynamic-stats cpu 4.0 mem 4096"
+    ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..4f81dfe2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
new file mode 100644
index 00000000..26be9421
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
@@ -0,0 +1,53 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute service vm:103 add node1 started 1
+info    120      cmdlist: execute service vm:103 set-static-stats maxcpu 8.0 maxmem 8192
+info    120      cmdlist: execute service vm:103 set-dynamic-stats cpu 4.0 mem 4096
+info    120    node1/crm: adding new service 'vm:103' on node 'node1'
+info    120    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info    140    node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.86)
+info    140    node1/crm: got crm command: migrate vm:101 node2
+info    140    node1/crm: crm command 'migrate vm:101 node2' - migrate service 'vm:102' to node 'node2' (service 'vm:102' in positive affinity with service 'vm:101')
+info    140    node1/crm: migrate service 'vm:101' to node 'node2'
+info    140    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    140    node1/crm: migrate service 'vm:102' to node 'node2'
+info    140    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    141    node1/lrm: service vm:101 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:101 - end migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - start migrate to node 'node2'
+info    141    node1/lrm: service vm:102 - end migrate to node 'node2'
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    160    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    160    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service vm:101
+info    163    node2/lrm: service status vm:101 started
+info    163    node2/lrm: starting service vm:102
+info    163    node2/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
new file mode 100644
index 00000000..e1948a00
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:102
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
new file mode 100644
index 00000000..880e0a59
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..455ae043
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
@@ -0,0 +1,4 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/README b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
new file mode 100644
index 00000000..4b4d4855
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will wait on
+a resource motion being finished, because a strict positive resource affinity
+rule is not correctly enforced yet.
+
+This test case manipulates the manager status in such a way, so that the HA
+Manager will assume that the not-yet-migrated HA resource in the strict
+positive resource affinity rule is still migrating as currently the integration
+tests do not support prolonged migrations.
+
+Furthermore, auto rebalancing migrations are forced to be issued as soon as
+possible with the hold duration being set to 0. This ensures that if the auto
+rebalance system would not wait on the ongoing migration, the auto rebalancing
+migration would be done right away in the same round as the HA resources being
+acknowledged as running.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..181ea848
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1,
+        "ha-auto-rebalance-hold-duration": 0
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..d35a2c8f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+    "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+    "vm:103": { "cpu": 4.7, "mem": 5242880000 },
+    "vm:104": { "cpu": 4.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
new file mode 100644
index 00000000..35282c7d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: service vm:101 - start migrate to node 'node1'
+info     23    node2/lrm: service vm:101 - end migrate to node 'node1'
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node1)
+info     41    node1/lrm: starting service vm:101
+info     41    node1/lrm: service status vm:101 started
+info     60    node1/crm: auto rebalance - migrate vm:102 to node2 (expected change for imbalance from 1.41 to 0.72)
+info     60    node1/crm: got crm command: migrate vm:102 node2
+info     60    node1/crm: migrate service 'vm:102' to node 'node2'
+info     60    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info     61    node1/lrm: service vm:102 - start migrate to node 'node2'
+info     61    node1/lrm: service vm:102 - end migrate to node 'node2'
+info     80    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info     83    node2/lrm: starting service vm:102
+info     83    node2/lrm: service status vm:102 started
+info    100    node1/crm: auto rebalance - migrate vm:101 to node3 (expected change for imbalance from 0.72 to 0.27)
+info    100    node1/crm: got crm command: migrate vm:101 node3
+info    100    node1/crm: crm command 'migrate vm:101 node3' - migrate service 'vm:103' to node 'node3' (service 'vm:103' in positive affinity with service 'vm:101')
+info    100    node1/crm: migrate service 'vm:101' to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    100    node1/crm: migrate service 'vm:103' to node 'node3'
+info    100    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    101    node1/lrm: service vm:101 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - start migrate to node 'node3'
+info    101    node1/lrm: service vm:103 - end migrate to node 'node3'
+info    105    node3/lrm: got lock 'ha_agent_node3_lock'
+info    105    node3/lrm: status change wait_for_agent_lock => active
+info    120    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    120    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    125    node3/lrm: starting service vm:101
+info    125    node3/lrm: service status vm:101 started
+info    125    node3/lrm: starting service vm:103
+info    125    node3/lrm: service status vm:103 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
new file mode 100644
index 00000000..cf90037c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
@@ -0,0 +1,31 @@
+{
+    "master_node": "node1",
+    "node_status": {
+	"node1":"online",
+	"node2":"online",
+	"node3":"online"
+    },
+    "service_status": {
+	"vm:101": {
+	    "node": "node2",
+	    "state": "migrate",
+	    "target": "node1",
+	    "uid": "RoPGTlvNYq/oZFokv9fgWw"
+	},
+        "vm:102": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "fR3i18EHk6DhF8Zd2jddNX"
+        },
+	"vm:103": {
+	    "node": "node1",
+	    "state": "started",
+	    "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+	},
+        "vm:104": {
+            "node": "node1",
+            "state": "started",
+	    "uid": "23hk23EHk6DhF8Zd0218DD"
+        }
+    }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
new file mode 100644
index 00000000..2c3f3171
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+	resources vm:101,vm:103
+	affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
new file mode 100644
index 00000000..3dadaabc
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/README b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
new file mode 100644
index 00000000..e304cc22
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will not
+rebalance a HA resource on the same node as another HA resource, which are in a
+strict negative resource affinity rule.
+
+There is a high node imbalance since vm:101 and vm:102 on node1 cause a higher
+usage than node2 and node3 have. Even though it would be ideal to move one of
+these to node2, because it has a very low usage, these cannot be moved there as
+both vm:101 and vm:102 are in a strict negative resource affinity rule with a
+HA resource on node2 respectively.
+
+To minimize the imbalance in the cluster, one of the HA resources from node1 is
+migrated to node3 first, and afterwards the HA resource on node3, which is not
+in a strict negative resource affinity rule with a HA resource on node2, will
+be migrated to node2.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
@@ -0,0 +1,3 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "dynamic",
+        "ha-auto-rebalance": 1
+    }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..083f338b
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "cpu": 0.9, "mem": 4294967296 },
+    "vm:102": { "cpu": 2.4, "mem": 2621440000 },
+    "vm:103": { "cpu": 0.0, "mem": 0 },
+    "vm:104": { "cpu": 1.0, "mem": 1073741824 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+  "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
new file mode 100644
index 00000000..cd87f3a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'dynamic'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node2'
+info     20    node1/crm: adding new service 'vm:104' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:104
+info     25    node3/lrm: service status vm:104 started
+info     80    node1/crm: auto rebalance - migrate vm:101 to node3 (expected change for imbalance from 1.04 to 0.72)
+info     80    node1/crm: got crm command: migrate vm:101 node3
+info     80    node1/crm: migrate service 'vm:101' to node 'node3'
+info     80    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info     81    node1/lrm: service vm:101 - start migrate to node 'node3'
+info     81    node1/lrm: service vm:101 - end migrate to node 'node3'
+info    100    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node3)
+info    105    node3/lrm: starting service vm:101
+info    105    node3/lrm: service status vm:101 started
+info    160    node1/crm: auto rebalance - migrate vm:104 to node2 (expected change for imbalance from 0.72 to 0.33)
+info    160    node1/crm: got crm command: migrate vm:104 node2
+info    160    node1/crm: migrate service 'vm:104' to node 'node2'
+info    160    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node3, target = node2)
+info    165    node3/lrm: service vm:104 - start migrate to node 'node2'
+info    165    node3/lrm: service vm:104 - end migrate to node 'node2'
+info    180    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    183    node2/lrm: starting service vm:104
+info    183    node2/lrm: service status vm:104 started
+info    620     hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
new file mode 100644
index 00000000..eef5460f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
@@ -0,0 +1,7 @@
+resource-affinity: vms-stay-apart1
+	resources vm:101,vm:103
+	affinity negative
+
+resource-affinity: vms-stay-apart2
+	resources vm:102,vm:103
+	affinity negative
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
new file mode 100644
index 00000000..16bffacf
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node1", "state": "started" },
+    "vm:103": { "node": "node2", "state": "started" },
+    "vm:104": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+    "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (23 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:33   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

Suggested-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!

 www/manager6/dc/OptionView.js | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index e80c6457..7136c914 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -195,8 +195,8 @@ Ext.define('PVE.dc.OptionView', {
                     value: '__default__',
                     comboItems: [
                         ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
-                        ['basic', 'Basic (Resource Count)'],
-                        ['static', 'Static Load'],
+                        ['basic', gettext('Basic (Resource Count)')],
+                        ['static', gettext('Static Load')],
                     ],
                     defaultValue: '__default__',
                 },
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable
  2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
@ 2026-04-02 13:33   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:33 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> Suggested-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - new!
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (24 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:33   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new! (but nothing changed since v1)

 www/manager6/dc/OptionView.js | 1 +
 1 file changed, 1 insertion(+)

diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index 7136c914..fa87832b 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -197,6 +197,7 @@ Ext.define('PVE.dc.OptionView', {
                         ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
                         ['basic', gettext('Basic (Resource Count)')],
                         ['static', gettext('Static Load')],
+                        ['dynamic', gettext('Dynamic Load')],
                     ],
                     defaultValue: '__default__',
                 },
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs
  2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
@ 2026-04-02 13:33   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:33 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - new! (but nothing changed since v1)
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (25 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:35   ` Dominik Rusovac
  2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

This is in preparation to the next patch, which adds a view model to the
component to make options dependent on each other's state.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!

 www/manager6/Makefile           |  1 +
 www/manager6/dc/OptionView.js   | 38 +++++-----------------
 www/manager6/form/CRSOptions.js | 56 +++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 31 deletions(-)
 create mode 100644 www/manager6/form/CRSOptions.js

diff --git a/www/manager6/Makefile b/www/manager6/Makefile
index 4558d53e..b0c94b32 100644
--- a/www/manager6/Makefile
+++ b/www/manager6/Makefile
@@ -27,6 +27,7 @@ JSSRC= 							\
 	form/BridgeSelector.js				\
 	form/BusTypeSelector.js				\
 	form/CPUModelSelector.js			\
+	form/CRSOptions.js				\
 	form/CacheTypeSelector.js			\
 	form/CalendarEvent.js				\
 	form/CephPoolSelector.js			\
diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index fa87832b..7c8d3792 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -180,38 +180,14 @@ Ext.define('PVE.dc.OptionView', {
                 },
             ],
         });
-        me.add_inputpanel_row('crs', gettext('Cluster Resource Scheduling'), {
+        me.rows.crs = {
+            required: true,
             renderer: PVE.Utils.render_as_property_string,
-            width: 450,
-            labelWidth: 120,
-            url: '/api2/extjs/cluster/options',
-            onlineHelp: 'ha_manager_crs',
-            items: [
-                {
-                    xtype: 'proxmoxKVComboBox',
-                    name: 'ha',
-                    fieldLabel: gettext('HA Scheduling'),
-                    deleteEmpty: false,
-                    value: '__default__',
-                    comboItems: [
-                        ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
-                        ['basic', gettext('Basic (Resource Count)')],
-                        ['static', gettext('Static Load')],
-                        ['dynamic', gettext('Dynamic Load')],
-                    ],
-                    defaultValue: '__default__',
-                },
-                {
-                    xtype: 'proxmoxcheckbox',
-                    name: 'ha-rebalance-on-start',
-                    fieldLabel: gettext('Rebalance on Start'),
-                    boxLabel: gettext(
-                        'Use CRS to select the least loaded node when starting an HA service',
-                    ),
-                    value: 0,
-                },
-            ],
-        });
+            header: gettext('Cluster Resource Scheduling'),
+            editor: {
+                xtype: 'pveCRSOptions',
+            },
+        };
         me.add_inputpanel_row('u2f', gettext('U2F Settings'), {
             renderer: (v) =>
                 !v ? Proxmox.Utils.NoneText : Ext.htmlEncode(PVE.Parser.printPropertyString(v)),
diff --git a/www/manager6/form/CRSOptions.js b/www/manager6/form/CRSOptions.js
new file mode 100644
index 00000000..b22c5c99
--- /dev/null
+++ b/www/manager6/form/CRSOptions.js
@@ -0,0 +1,56 @@
+Ext.define('PVE.form.CRSOptions', {
+    extend: 'Proxmox.window.Edit',
+    alias: 'widget.pveCRSOptions',
+
+    width: 450,
+    url: '/api2/extjs/cluster/options',
+    onlineHelp: 'ha_manager_crs',
+
+    fieldDefaults: {
+        labelWidth: 120,
+    },
+
+    setValues: function (values) {
+        Ext.Array.each(this.query('inputpanel'), (panel) => {
+            panel.setValues(values.crs);
+        });
+    },
+
+    items: [
+        {
+            xtype: 'inputpanel',
+            onGetValues: function (values) {
+                if (values === undefined || Object.keys(values).length === 0) {
+                    return { delete: 'crs' };
+                } else {
+                    return { crs: PVE.Parser.printPropertyString(values) };
+                }
+            },
+            items: [
+                {
+                    xtype: 'proxmoxKVComboBox',
+                    name: 'ha',
+                    fieldLabel: gettext('HA Scheduling'),
+                    deleteEmpty: false,
+                    value: '__default__',
+                    comboItems: [
+                        ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
+                        ['basic', gettext('Basic (Resource Count)')],
+                        ['static', gettext('Static Load')],
+                        ['dynamic', gettext('Dynamic Load')],
+                    ],
+                    defaultValue: '__default__',
+                },
+                {
+                    xtype: 'proxmoxcheckbox',
+                    name: 'ha-rebalance-on-start',
+                    fieldLabel: gettext('Rebalance on Start'),
+                    boxLabel: gettext(
+                        'Use CRS to select the least loaded node when starting an HA service',
+                    ),
+                    value: 0,
+                },
+            ],
+        },
+    ],
+});
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component
  2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
@ 2026-04-02 13:35   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:35 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> This is in preparation to the next patch, which adds a view model to the
> component to make options dependent on each other's state.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - new!
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (26 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
  2026-04-02 13:38   ` Dominik Rusovac
  2026-04-02 14:24 ` [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Dominik Rusovac
  2026-04-02 16:07 ` applied: " Thomas Lamprecht
  29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
  To: pve-devel

ha-auto-rebalance-{method,margin,hold-duration,margin} require
ha-auto-rebalance to be enabled in the schema, therefore they are
disabled here unless ha-auto-rebalance is enabled.

The label width was enlared a bit, so that the longer labels for the
auto rebalancing options are more readable.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!
- only changes from v1 are that a separate component is used now, it
  uses a viewModel to disable fields that shouldn't be set, and widen
  the label width a bit;
- also 'Margin' is 'Minimum Imbalance Improvement' in the UI

 www/manager6/form/CRSOptions.js | 62 ++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/www/manager6/form/CRSOptions.js b/www/manager6/form/CRSOptions.js
index b22c5c99..b5476bd5 100644
--- a/www/manager6/form/CRSOptions.js
+++ b/www/manager6/form/CRSOptions.js
@@ -7,7 +7,7 @@ Ext.define('PVE.form.CRSOptions', {
     onlineHelp: 'ha_manager_crs',
 
     fieldDefaults: {
-        labelWidth: 120,
+        labelWidth: 150,
     },
 
     setValues: function (values) {
@@ -16,6 +16,8 @@ Ext.define('PVE.form.CRSOptions', {
         });
     },
 
+    viewModel: {},
+
     items: [
         {
             xtype: 'inputpanel',
@@ -50,6 +52,64 @@ Ext.define('PVE.form.CRSOptions', {
                     ),
                     value: 0,
                 },
+                {
+                    xtype: 'proxmoxcheckbox',
+                    name: 'ha-auto-rebalance',
+                    fieldLabel: gettext('Automatic Rebalance'),
+                    boxLabel: gettext('Automatically rebalance HA resources'),
+                    value: 0,
+                    reference: 'enableAutoRebalance',
+                },
+                {
+                    xtype: 'numberfield',
+                    name: 'ha-auto-rebalance-threshold',
+                    fieldLabel: gettext('Imbalance Threshold'),
+                    emptyText: '0.3',
+                    minValue: 0.0,
+                    step: 0.01,
+                    bind: {
+                        disabled: '{!enableAutoRebalance.checked}',
+                    },
+                },
+                {
+                    xtype: 'proxmoxKVComboBox',
+                    name: 'ha-auto-rebalance-method',
+                    fieldLabel: gettext('Rebalancing Method'),
+                    deleteEmpty: false,
+                    value: '__default__',
+                    comboItems: [
+                        ['__default__', Proxmox.Utils.defaultText + ' (bruteforce)'],
+                        ['bruteforce', 'Bruteforce'],
+                        ['topsis', 'TOPSIS'],
+                    ],
+                    defaultValue: '__default__',
+                    bind: {
+                        disabled: '{!enableAutoRebalance.checked}',
+                    },
+                },
+                {
+                    xtype: 'numberfield',
+                    name: 'ha-auto-rebalance-hold-duration',
+                    fieldLabel: gettext('Hold Duration'),
+                    emptyText: '3',
+                    minValue: 0,
+                    step: 1,
+                    bind: {
+                        disabled: '{!enableAutoRebalance.checked}',
+                    },
+                },
+                {
+                    xtype: 'numberfield',
+                    name: 'ha-auto-rebalance-margin',
+                    fieldLabel: gettext('Minimum Imbalance Improvement'),
+                    emptyText: '0.1',
+                    minValue: 0.0,
+                    maxValue: 1.0,
+                    step: 0.01,
+                    bind: {
+                        disabled: '{!enableAutoRebalance.checked}',
+                    },
+                },
             ],
         },
     ],
-- 
2.47.3





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options
  2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
@ 2026-04-02 13:38   ` Dominik Rusovac
  0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:38 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

lgtm, consider this

On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> ha-auto-rebalance-{method,margin,hold-duration,margin} require
> ha-auto-rebalance to be enabled in the schema, therefore they are
> disabled here unless ha-auto-rebalance is enabled.

good measure

>
> The label width was enlared a bit, so that the longer labels for the
> auto rebalancing options are more readable.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - new!
> - only changes from v1 are that a separate component is used now, it
>   uses a viewModel to disable fields that shouldn't be set, and widen
>   the label width a bit;
> - also 'Margin' is 'Minimum Imbalance Improvement' in the UI

+1, is more comprehensible imo 

>
>  www/manager6/form/CRSOptions.js | 62 ++++++++++++++++++++++++++++++++-
>  1 file changed, 61 insertions(+), 1 deletion(-)
>

[snip]

Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (27 preceding siblings ...)
  2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
@ 2026-04-02 14:24 ` Dominik Rusovac
  2026-04-02 16:07 ` applied: " Thomas Lamprecht
  29 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 14:24 UTC (permalink / raw)
  To: Daniel Kral, pve-devel

Reviewed all of the patches from v1 up to v4. Tested the behavior 
of the CRS in a 3-node-cluster and in a 7-node-cluster regarding:
* disarmed HA and re-armed HA
* maintenance mode of nodes
* fenced nodes
* affinity rules

Moreover:
* used all of the variations (static or dynamic with bruteforce or
topsis)
* played around with a bunch of different thresholds, margins and hold
durations for the purpose of fine tuning the scheduler
* verified that hostnames can include hyphens, for example
* verified that minimum requirements for number fields are detected
* used UI for setting different auto-rebalance parameters

Observations:
* scoring of best migration cannot happen in the same round as enabling
maintenance mode, obtained warning:

    "unable to score best balancing migration - leader 'ct:205' is not present in the cluster usage"

* sustained imbalance round counter is not reset in case of early
returns, which, e.g., can cause auto rebalance immediately after
re-arming or disabling maintenance mode

Looks good to me overall, I think the tiny things related to my
observations could be fixed in a small follow-up.

On Thu Apr 2, 2026 at 2:43 PM CEST, Daniel Kral wrote:
> Here's the v4 of the load balancer patches for the HA Manager.
>
> Most of the patches here are already R-b'd by @Dominik (many, many
> thanks!) and only a few things have changed, the biggest of course is
> changing the default node imbalance threshold from '0.7' to '0.3' and
> adding the pve-manager patches.
>
> I'm already half-way there with the pve-docs patches, but will send them
> in a separate patch series (as the changes are also updating the CRS
> section in general).
>
> Thank you very much for the feedback @Dominik, @Thomas, @Maximiliano,
> and @Jillian Morgan!
>

[snip]

Consider this as: 

Tested-by: Dominik Rusovac <d.rusovac@proxmox.com>




^ permalink raw reply	[flat|nested] 41+ messages in thread

* applied: [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer
  2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
                   ` (28 preceding siblings ...)
  2026-04-02 14:24 ` [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Dominik Rusovac
@ 2026-04-02 16:07 ` Thomas Lamprecht
  29 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2026-04-02 16:07 UTC (permalink / raw)
  To: pve-devel, Daniel Kral

On Thu, 02 Apr 2026 14:43:54 +0200, Daniel Kral wrote:
> Here's the v4 of the load balancer patches for the HA Manager.
> 
> Most of the patches here are already R-b'd by @Dominik (many, many
> thanks!) and only a few things have changed, the biggest of course is
> changing the default node imbalance threshold from '0.7' to '0.3' and
> adding the pve-manager patches.
> 
> [...]

Applied, thanks to all involved, nice work!

cluster:

[1/3] datacenter config: restructure verbose description for the ha crs option
      commit: 79cd0872a4dafa7bd480e2d70aca3757afd25e61
[2/3] datacenter config: add dynamic load scheduler option
      commit: 871f0973bb6828247aa7ef2b72cca6565d84306d
[3/3] datacenter config: add auto rebalancing options
      commit: f3f8347a1b2c343929e010bb7c9929098a226168

ha-manager:

[01/21] env: pve2: implement dynamic node and service stats
        commit: addedabda082cd2fbc43ce114d3f62d1dab43c6e
[02/21] sim: hardware: pass correct types for static stats
        commit: 6316e7e38ff2334fa63733622db9a96c834e1a05
[03/21] sim: hardware: factor out static stats' default values
        commit: 2679bfb5eca9dc978b1d03a8ef094a090b08f4b8
[04/21] sim: hardware: fix static stats guard
        commit: a9d1210db90eca0c06959c1df90d8035d3e01937
[05/21] sim: hardware: handle dynamic service stats
        commit: 55375917fdd90d8d4457d686161e937cffc7e330
[06/21] sim: hardware: add set-dynamic-stats command
        commit: c334bb4df905879c88ea75f5a5be43c3dd98bcec
[07/21] sim: hardware: add getters for dynamic {node,service} stats
        commit: 75235b476937b2bf85f136b257d193d48231164d
[08/21] usage: pass service data to add_service_usage
        commit: 7af6ee02a9f3c31adc47c6a0c5531eff9545dea3
[09/21] usage: pass service data to get_used_service_nodes
        commit: 1ffe83333bb11981f9d4642a9d82a2c28c649f73
[10/21] add running flag to non-HA cluster service stats
        commit: 54789d6b162d30e093522f6adbc224168c427877
[11/21] usage: use add_service to add service usage to nodes
        commit: 9780600e3539f0851873f45cec6ac33ce7220212
[12/21] usage: add dynamic usage scheduler
        commit: 6684f186212cf66ea54c7f6115778eb779ed3322
[13/21] test: add dynamic usage scheduler test cases
        commit: c377eacd2022250bd6b229fed1b33c4f9b1c456e
[14/21] manager: rename execute_migration to queue_resource_motion
        commit: 4c87446560bcea0bd9ed2d05c1b5ff3c561e093d
[15/21] manager: update_crs_scheduler_mode: factor out crs config
        commit: 55cfbf0ac35448aa246ede4a07e12f95a09ada4e
[16/21] implement automatic rebalancing
        commit: f0f21bc1c547e578cfb520d134d38530c759119c
[17/21] test: add resource bundle generation test cases
        commit: 8c0f2312561a59cbafc5c910b36684a6c82eedc3
[18/21] test: add dynamic automatic rebalancing system test cases
        commit: 36813aca2f1a1d0e1fcb839f611b8ea2f5039f26
[19/21] test: add static automatic rebalancing system test cases
        commit: ada49c44a2e49bcb01c24c2828dd31ab0e27fff6
[20/21] test: add automatic rebalancing system test cases with TOPSIS method
        commit: 1419ec503b5b9eaf4bfabf7a38ed3ee80b101234
[21/21] test: add automatic rebalancing system test cases with affinity rules
        commit: 6699b3e5a192c50a380391a32bdd613340f16751

manager:

[1/4] ui: dc/options: make the ha crs strings translatable
      commit: c3b5bbe4779a1e63cebd190c833ffe5f032d6d5f
[2/4] ui: dc/options: add dynamic load scheduler option for ha crs
      commit: c4f15682a44f7494f9e89cdbd09875c7c75c209a
[3/4] ui: move cluster resource scheduling from dc/options into separate component
      commit: 8a032a0c1dfec068c7abc1b40327e03f6568e731
[4/4] ui: form: add crs auto rebalancing options
      commit: 4557bdf8155b8fc5e3e17b18e980ddda9e78b1d4




^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-04-02 16:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
2026-04-02 13:07   ` Dominik Rusovac
2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
2026-04-02 13:40   ` Dominik Rusovac
2026-04-02 12:43 ` [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
2026-04-02 13:14   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
2026-04-02 13:21   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
2026-04-02 13:23   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-04-02 13:29   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
2026-04-02 13:33   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
2026-04-02 13:33   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
2026-04-02 13:35   ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
2026-04-02 13:38   ` Dominik Rusovac
2026-04-02 14:24 ` [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Dominik Rusovac
2026-04-02 16:07 ` applied: " Thomas Lamprecht

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal