* [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option Daniel Kral
` (28 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
To: pve-devel
This makes it a little easier to read and allows appending descriptions
for other values with a cleaner diff.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/DataCenterConfig.pm | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index d88b167..c275163 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -17,9 +17,12 @@ my $crs_format = {
optional => 1,
default => 'basic',
description => "Use this resource scheduler mode for HA.",
- verbose_description => "Configures how the HA manager should select nodes to start or "
- . "recover services. With 'basic', only the number of services is used, with 'static', "
- . "static CPU and memory configuration of services is considered.",
+ verbose_description => <<EODESC,
+Configures how the HA Manager should select nodes to start or recover services:
+
+- with 'basic', only the number of services is used,
+- with 'static', static CPU and memory configuration of services are considered.
+EODESC
},
'ha-rebalance-on-start' => {
type => 'boolean',
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
` (27 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/DataCenterConfig.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index c275163..0225bc6 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -13,7 +13,7 @@ my $PROXMOX_OUI = 'BC:24:11';
my $crs_format = {
ha => {
type => 'string',
- enum => ['basic', 'static'],
+ enum => ['basic', 'static', 'dynamic'],
optional => 1,
default => 'basic',
description => "Use this resource scheduler mode for HA.",
@@ -21,7 +21,8 @@ my $crs_format = {
Configures how the HA Manager should select nodes to start or recover services:
- with 'basic', only the number of services is used,
-- with 'static', static CPU and memory configuration of services are considered.
+- with 'static', static CPU and memory configuration of services are considered,
+- with 'dynamic', static and dynamic CPU and memory usage of services are considered.
EODESC
},
'ha-rebalance-on-start' => {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 01/28] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-04-02 12:43 ` [PATCH cluster v4 02/28] datacenter config: add dynamic load scheduler option Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
2026-04-02 13:07 ` Dominik Rusovac
2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
` (26 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
To: pve-devel
These options control the behavior of the load balancing system in the
HA Manager.
The imbalance threshold default value is set to `0.3`, as
experimentation with some common cluster sizes showed good results. This
might need more adaption in the future, such as a cluster-size-dependent
profile setting to find a better threshold default value.
Another inbalance threshold default value, which was considered, was
`0.15`, which is the minimum threshold to detect an imbalance in a
cluster with one node with load 0.0 and the other nodes with load 1.0
for a cluster size of up to 45 nodes. For cluster size N, this is
derived with:
node_loads = [0.0] + [1.0 for _ in range(N-1)]
min_imbalance = calculate_node_imbalance(node_loads)
Though a good starting metric, the imbalance threshold of `0.15` would
be too sensitive for small cluster sizes and `0.3` was a better balance
for that.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- change threshold default value from 0.7 to 0.3
- add minimum requirements to number fields
src/PVE/DataCenterConfig.pm | 44 +++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 0225bc6..6513594 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -33,6 +33,50 @@ EODESC
"Set to use CRS for selecting a suited node when a HA services request-state"
. " changes from stop to start.",
},
+ 'ha-auto-rebalance' => {
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ description => "Whether to use CRS for balancing HA resources automatically"
+ . " depending on the current node imbalance.",
+ },
+ 'ha-auto-rebalance-threshold' => {
+ type => 'number',
+ optional => 1,
+ minimum => 0.0,
+ default => 0.3,
+ requires => 'ha-auto-rebalance',
+ description => "The threshold for the cluster node imbalance, which will"
+ . " trigger the automatic resource balancing system if its value"
+ . " is exceeded.",
+ },
+ 'ha-auto-rebalance-method' => {
+ type => 'string',
+ enum => ['bruteforce', 'topsis'],
+ optional => 1,
+ default => 'bruteforce',
+ requires => 'ha-auto-rebalance',
+ description => "The method to use for the scoring of balancing migrations.",
+ },
+ 'ha-auto-rebalance-hold-duration' => {
+ type => 'number',
+ optional => 1,
+ minimum => 0,
+ default => 3,
+ requires => 'ha-auto-rebalance',
+ description => "The number of HA rounds for which the cluster node"
+ . " imbalance threshold must be exceeded before triggering an"
+ . " automatic resource balancing migration.",
+ },
+ 'ha-auto-rebalance-margin' => {
+ type => 'number',
+ optional => 1,
+ minimum => 0.0,
+ default => 0.1,
+ requires => 'ha-auto-rebalance',
+ description => "The minimum relative improvement in cluster node"
+ . " imbalance to commit to a resource balancing migration.",
+ },
};
my $migration_format = {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options
2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-04-02 13:07 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:07 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:43 PM CEST, Daniel Kral wrote:
> These options control the behavior of the load balancing system in the
> HA Manager.
>
> The imbalance threshold default value is set to `0.3`, as
> experimentation with some common cluster sizes showed good results. This
> might need more adaption in the future, such as a cluster-size-dependent
> profile setting to find a better threshold default value.
+1
>
> Another inbalance threshold default value, which was considered, was
> `0.15`, which is the minimum threshold to detect an imbalance in a
> cluster with one node with load 0.0 and the other nodes with load 1.0
> for a cluster size of up to 45 nodes. For cluster size N, this is
> derived with:
>
> node_loads = [0.0] + [1.0 for _ in range(N-1)]
> min_imbalance = calculate_node_imbalance(node_loads)
>
> Though a good starting metric, the imbalance threshold of `0.15` would
> be too sensitive for small cluster sizes and `0.3` was a better balance
> for that.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - change threshold default value from 0.7 to 0.3
> - add minimum requirements to number fields
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (2 preceding siblings ...)
2026-04-02 12:43 ` [PATCH cluster v4 03/28] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
2026-04-02 13:40 ` Dominik Rusovac
2026-04-02 12:43 ` [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats Daniel Kral
` (25 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
To: pve-devel
Fetch the dynamic node and service stats with rrd_dump(), which is
periodically sampled and broadcasted by the PVE nodes' pvestatd service
and propagated through the pmxcfs.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- introduce $HOSTNAME_RE and use it for nodename matching in
get_dynamic_node_stats()
src/PVE/HA/Env/PVE2.pm | 65 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 04cd1bfe..fc815fe0 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -42,8 +42,23 @@ my $lockdir = "/etc/pve/priv/lock";
# taken from PVE::Service::pvestatd::update_{lxc,qemu}_status()
use constant {
RRD_VM_INDEX_STATUS => 2,
+ RRD_VM_INDEX_MAXCPU => 5,
+ RRD_VM_INDEX_CPU => 6,
+ RRD_VM_INDEX_MAXMEM => 7,
+ RRD_VM_INDEX_MEM => 8,
};
+# rrd entry indices for PVE nodes
+# taken from PVE::Service::pvestatd::update_node_status()
+use constant {
+ RRD_NODE_INDEX_MAXCPU => 4,
+ RRD_NODE_INDEX_CPU => 5,
+ RRD_NODE_INDEX_MAXMEM => 7,
+ RRD_NODE_INDEX_MEM => 8,
+};
+
+my $HOSTNAME_RE = qr/(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}?[a-zA-Z0-9])?)/;
+
sub new {
my ($this, $nodename) = @_;
@@ -569,6 +584,30 @@ sub get_static_service_stats {
return $stats;
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ my $rrd = PVE::Cluster::rrd_dump();
+
+ my $stats = get_cluster_service_stats();
+ for my $sid (keys %$stats) {
+ my $id = $stats->{$sid}->{id};
+ my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
+
+ # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
+ my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
+
+ $stats->{$sid}->{usage} = {
+ maxcpu => $maxcpu,
+ cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+ maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
+ mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
+ };
+ }
+
+ return $stats;
+}
+
sub get_static_node_stats {
my ($self) = @_;
@@ -588,6 +627,32 @@ sub get_static_node_stats {
return $stats;
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ my $rrd = PVE::Cluster::rrd_dump();
+
+ my $stats = {};
+ for my $key (keys %$rrd) {
+ my ($nodename) = $key =~ m/^pve-node-9.0\/($HOSTNAME_RE)$/;
+
+ next if !$nodename;
+
+ my $rrdentry = $rrd->{$key} // [];
+
+ my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
+
+ $stats->{$nodename} = {
+ maxcpu => $maxcpu,
+ cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+ maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
+ mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
+ };
+ }
+
+ return $stats;
+}
+
sub get_node_version {
my ($self, $node) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (3 preceding siblings ...)
2026-04-02 12:43 ` [PATCH ha-manager v4 04/28] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-04-02 12:43 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values Daniel Kral
` (24 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:43 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
CRM expects f64 for cpu-related values and usize for mem-related values.
Hence, pass doubles for the former and ints for the latter.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Hardware.pm | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 474cee16..cfcd7ab1 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -488,9 +488,9 @@ sub new {
|| die "Copy failed: $!\n";
} else {
my $cstatus = {
- node1 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
- node2 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
- node3 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
+ node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
};
$self->write_hardware_status_nolock($cstatus);
}
@@ -507,7 +507,7 @@ sub new {
copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
} else {
my $services = $self->read_service_config();
- my $stats = { map { $_ => { maxcpu => 4, maxmem => 4096 } } keys %$services };
+ my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
$self->write_static_service_stats($stats);
}
@@ -883,7 +883,7 @@ sub sim_hardware_cmd {
$self->set_static_service_stats(
$sid,
- { maxcpu => $params[0], maxmem => $params[1] },
+ { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
);
} elsif ($action eq 'manual-migrate') {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (4 preceding siblings ...)
2026-04-02 12:43 ` [PATCH ha-manager v4 05/28] sim: hardware: pass correct types for static stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard Daniel Kral
` (23 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Hardware.pm | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index cfcd7ab1..026be6f8 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,6 +21,11 @@ use PVE::HA::Groups;
my $watchdog_timeout = 60;
+my $default_service_maxcpu = 4.0;
+my $default_service_maxmem = 4096 * 1024**2;
+my $default_node_maxcpu = 24.0;
+my $default_node_maxmem = 131072 * 1024**2;
+
# Status directory layout
#
# configuration
@@ -488,9 +493,24 @@ sub new {
|| die "Copy failed: $!\n";
} else {
my $cstatus = {
- node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
- node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
- node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node1 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
+ node2 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
+ node3 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
};
$self->write_hardware_status_nolock($cstatus);
}
@@ -507,7 +527,12 @@ sub new {
copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
} else {
my $services = $self->read_service_config();
- my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
+ my $stats = {
+ map {
+ $_ => { maxcpu => $default_service_maxcpu, maxmem => $default_service_maxmem }
+ }
+ keys %$services
+ };
$self->write_static_service_stats($stats);
}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (5 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 06/28] sim: hardware: factor out static stats' default values Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats Daniel Kral
` (22 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
While falsy, values of 0 or 0.0 are valid stats. Hence, use
'defined'-check to avoid skipping falsy static service stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Hardware.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 026be6f8..afdb7b5f 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -202,11 +202,11 @@ sub set_static_service_stats {
my $stats = $self->read_static_service_stats();
- if (my $memory = $new_stats->{maxmem}) {
+ if (defined(my $memory = $new_stats->{maxmem})) {
$stats->{$sid}->{maxmem} = $memory;
}
- if (my $cpu = $new_stats->{maxcpu}) {
+ if (defined(my $cpu = $new_stats->{maxcpu})) {
$stats->{$sid}->{maxcpu} = $cpu;
}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (6 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 07/28] sim: hardware: fix static stats guard Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command Daniel Kral
` (21 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
This adds functionality to simulate dynamic stats of a service, that is,
cpu load (cores) and memory usage (MiB).
Analogous to static service stats, within tests, dynamic service stats
can be specified in file dynamic_service_stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Hardware.pm | 52 ++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index afdb7b5f..3439bc36 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,8 +21,11 @@ use PVE::HA::Groups;
my $watchdog_timeout = 60;
+my $default_service_cpu = 2.0;
my $default_service_maxcpu = 4.0;
+my $default_service_mem = 2048 * 1024**2;
my $default_service_maxmem = 4096 * 1024**2;
+
my $default_node_maxcpu = 24.0;
my $default_node_maxmem = 131072 * 1024**2;
@@ -213,6 +216,25 @@ sub set_static_service_stats {
$self->write_static_service_stats($stats);
}
+sub set_dynamic_service_stats {
+ my ($self, $sid, $new_stats) = @_;
+
+ my $conf = $self->read_service_config();
+ die "no such service '$sid'" if !$conf->{$sid};
+
+ my $stats = $self->read_dynamic_service_stats();
+
+ if (defined(my $memory = $new_stats->{mem})) {
+ $stats->{$sid}->{mem} = $memory;
+ }
+
+ if (defined(my $cpu = $new_stats->{cpu})) {
+ $stats->{$sid}->{cpu} = $cpu;
+ }
+
+ $self->write_dynamic_service_stats($stats);
+}
+
sub add_service {
my ($self, $sid, $opts, $running) = @_;
@@ -438,6 +460,16 @@ sub read_static_service_stats {
return $stats;
}
+sub read_dynamic_service_stats {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/dynamic_service_stats";
+ my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
+ $self->log('error', "loading dynamic service stats failed - $@") if $@;
+
+ return $stats;
+}
+
sub write_static_service_stats {
my ($self, $stats) = @_;
@@ -446,6 +478,14 @@ sub write_static_service_stats {
$self->log('error', "writing static service stats failed - $@") if $@;
}
+sub write_dynamic_service_stats {
+ my ($self, $stats) = @_;
+
+ my $filename = "$self->{statusdir}/dynamic_service_stats";
+ eval { PVE::HA::Tools::write_json_to_file($filename, $stats) };
+ $self->log('error', "writing dynamic service stats failed - $@") if $@;
+}
+
sub new {
my ($this, $testdir) = @_;
@@ -536,6 +576,18 @@ sub new {
$self->write_static_service_stats($stats);
}
+ if (-f "$testdir/dynamic_service_stats") {
+ copy("$testdir/dynamic_service_stats", "$statusdir/dynamic_service_stats");
+ } else {
+ my $services = $self->read_static_service_stats();
+ my $stats = {
+ map { $_ => { cpu => $default_service_cpu, mem => $default_service_mem } }
+ keys %$services
+ };
+
+ $self->write_dynamic_service_stats($stats);
+ }
+
my $cstatus = $self->read_hardware_status_nolock();
foreach my $node (sort keys %$cstatus) {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (7 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 08/28] sim: hardware: handle dynamic service stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
` (20 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Add command to set dynamic service stats and handle respective commands
set-dynamic-stats and set-static-stats analogously.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Hardware.pm | 34 ++++++++++++++++++++++++++--------
src/PVE/HA/Sim/RTHardware.pm | 4 +++-
2 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 3439bc36..b641f3c9 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -795,7 +795,8 @@ sub get_cfs_state {
# service <sid> stop <timeout>
# service <sid> lock/unlock [lockname]
# service <sid> add <node> [<request-state=started>] [<running=0>]
-# service <sid> set-static-stats <maxcpu> <maxmem>
+# service <sid> set-static-stats [maxcpu <cores>] [maxmem <MiB>]
+# service <sid> set-dynamic-stats [cpu <cores>] [mem <MiB>]
# service <sid> delete
sub sim_hardware_cmd {
my ($self, $cmdstr, $logid) = @_;
@@ -954,15 +955,32 @@ sub sim_hardware_cmd {
$params[2] || 0,
);
- } elsif ($action eq 'set-static-stats') {
- die "sim_hardware_cmd: missing maxcpu for '$action' command" if !$params[0];
- die "sim_hardware_cmd: missing maxmem for '$action' command" if !$params[1];
+ } elsif ($action eq 'set-static-stats' || $action eq 'set-dynamic-stats') {
+ die "sim_hardware_cmd: missing target stat for '$action' command"
+ if !@params;
- $self->set_static_service_stats(
- $sid,
- { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
- );
+ my $conversions =
+ $action eq 'set-static-stats'
+ ? { maxcpu => sub { 0.0 + $_[0] }, maxmem => sub { $_[0] * 1024**2 } }
+ : { cpu => sub { 0.0 + $_[0] }, mem => sub { $_[0] * 1024**2 } };
+ my %new_stats;
+ for my ($target, $val) (@params) {
+ die "sim_hardware_cmd: missing value for '$action $target' command"
+ if !defined($val);
+
+ my $convert = $conversions->{$target}
+ or die
+ "sim_hardware_cmd: unknown target stat '$target' for '$action' command";
+
+ $new_stats{$target} = $convert->($val);
+ }
+
+ if ($action eq 'set-static-stats') {
+ $self->set_static_service_stats($sid, \%new_stats);
+ } else {
+ $self->set_dynamic_service_stats($sid, \%new_stats);
+ }
} elsif ($action eq 'manual-migrate') {
die "sim_hardware_cmd: missing target node for '$action' command"
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 9a83d098..9528f542 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -532,7 +532,9 @@ sub show_service_add_dialog {
my $maxcpu = $cpu_count_spin->get_value();
my $maxmem = $memory_spin->get_value();
- $self->sim_hardware_cmd("service $sid set-static-stats $maxcpu $maxmem", 'command');
+ $self->sim_hardware_cmd(
+ "service $sid set-static-stats maxcpu $maxcpu maxmem $maxmem", 'command',
+ );
$self->add_service_to_gui($sid);
}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (8 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 09/28] sim: hardware: add set-dynamic-stats command Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage Daniel Kral
` (19 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Aggregation of dynamic node stats is lazy.
Getters log on warning level in case of overcommitted stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Sim/Env.pm | 12 ++++++++
src/PVE/HA/Sim/Hardware.pm | 61 ++++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index ad51245c..65d4efad 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -500,12 +500,24 @@ sub get_static_service_stats {
return $self->{hardware}->get_static_service_stats();
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ return $self->{hardware}->get_dynamic_service_stats();
+}
+
sub get_static_node_stats {
my ($self) = @_;
return $self->{hardware}->get_static_node_stats();
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ return $self->{hardware}->get_dynamic_node_stats();
+}
+
sub get_node_version {
my ($self, $node) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index b641f3c9..1959f5c9 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1232,6 +1232,27 @@ sub get_static_service_stats {
return $stats;
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ my $stats = get_cluster_service_stats($self);
+ my $static_stats = $self->read_static_service_stats();
+ my $dynamic_stats = $self->read_dynamic_service_stats();
+
+ for my $sid (keys %$stats) {
+ $stats->{$sid}->{usage} = {
+ $static_stats->{$sid}->%*, $dynamic_stats->{$sid}->%*,
+ };
+
+ $self->log('warning', "overcommitted cpu on '$sid'")
+ if $stats->{$sid}->{usage}->{cpu} > $stats->{$sid}->{usage}->{maxcpu};
+ $self->log('warning', "overcommitted mem on '$sid'")
+ if $stats->{$sid}->{usage}->{mem} > $stats->{$sid}->{usage}->{maxmem};
+ }
+
+ return $stats;
+}
+
sub get_static_node_stats {
my ($self) = @_;
@@ -1245,6 +1266,46 @@ sub get_static_node_stats {
return $stats;
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ my $stats = $self->get_static_node_stats();
+ for my $node (keys %$stats) {
+ $stats->{$node}->{maxcpu} = $stats->{$node}->{maxcpu} // $default_node_maxcpu;
+ $stats->{$node}->{cpu} = $stats->{$node}->{cpu} // 0.0;
+ $stats->{$node}->{maxmem} = $stats->{$node}->{maxmem} // $default_node_maxmem;
+ $stats->{$node}->{mem} = $stats->{$node}->{mem} // 0;
+ }
+
+ my $service_conf = $self->read_service_config();
+ my $dynamic_service_stats = $self->get_dynamic_service_stats();
+
+ my $cstatus = $self->read_hardware_status_nolock();
+ my $node_service_status = { map { $_ => $self->read_service_status($_) } keys %$cstatus };
+
+ for my $sid (keys %$service_conf) {
+ my $node = $service_conf->{$sid}->{node};
+
+ # only add the dynamic load usage to node if service is actually marked
+ # as running by the node service status written by the LRM
+ if ($node_service_status->{$node}->{$sid}) {
+ my ($cpu, $mem) = $dynamic_service_stats->{$sid}->{usage}->@{qw(cpu mem)};
+
+ die "unknown cpu load for '$sid'" if !defined($cpu);
+ $stats->{$node}->{cpu} += $cpu;
+ $self->log('warning', "overcommitted cpu on '$node'")
+ if $stats->{$node}->{cpu} > $stats->{$node}->{maxcpu};
+
+ die "unknown memory usage for '$sid'" if !defined($mem);
+ $stats->{$node}->{mem} += $mem;
+ $self->log('warning', "overcommitted mem on '$node'")
+ if $stats->{$node}->{mem} > $stats->{$node}->{maxmem};
+ }
+ }
+
+ return $stats;
+}
+
sub get_node_version {
my ($self, $node) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (9 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 10/28] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes Daniel Kral
` (18 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
The method is already dependent on three members of the service data and
in a following patch a fourth member is needed for adding more
information to the Usage implementations.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Manager.pm | 11 +++++------
src/PVE/HA/Usage.pm | 6 +++---
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index fbc7f931..71f45b5c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -284,17 +284,17 @@ sub recompute_online_node_usage {
foreach my $sid (sort keys %{ $self->{ss} }) {
my $sd = $self->{ss}->{$sid};
- $online_node_usage->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+ $online_node_usage->add_service_usage($sid, $sd);
}
# add remaining non-HA resources to online node usage
for my $sid (sort keys %$service_stats) {
next if $self->{ss}->{$sid};
- my ($node, $state) = $service_stats->{$sid}->@{qw(node state)};
-
# the migration target is not known for non-HA resources
- $online_node_usage->add_service_usage($sid, $state, $node, undef);
+ my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+
+ $online_node_usage->add_service_usage($sid, $sd);
}
$self->{online_node_usage} = $online_node_usage;
@@ -332,8 +332,7 @@ my $change_service_state = sub {
}
$self->{online_node_usage}->remove_service_usage($sid);
- $self->{online_node_usage}
- ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+ $self->{online_node_usage}->add_service_usage($sid, $sd);
$sd->{uid} = compute_new_uuid($new_state);
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 9f19a82b..6d53f956 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -40,12 +40,12 @@ sub add_service_usage_to_node {
die "implement in subclass";
}
-# Adds service $sid's usage to the online nodes according to their $state,
-# $service_node and $migration_target.
+# Adds service $sid's usage to the online nodes according to their service data $sd.
sub add_service_usage {
- my ($self, $sid, $service_state, $service_node, $migration_target) = @_;
+ my ($self, $sid, $sd) = @_;
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
+ my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
my ($current_node, $target_node) =
get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (10 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 11/28] usage: pass service data to add_service_usage Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats Daniel Kral
` (17 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
Remove some unnecessary destructuring syntax for the helper.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Rules/ResourceAffinity.pm | 3 +--
src/PVE/HA/Usage.pm | 13 ++++++-------
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 1c610430..474d3000 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -511,8 +511,7 @@ sub get_resource_affinity {
my $get_used_service_nodes = sub {
my ($sid) = @_;
return (undef, undef) if !defined($ss->{$sid});
- my ($state, $node, $target) = $ss->{$sid}->@{qw(state node target)};
- return PVE::HA::Usage::get_used_service_nodes($online_nodes, $state, $node, $target);
+ return PVE::HA::Usage::get_used_service_nodes($online_nodes, $ss->{$sid});
};
for my $csid (keys $positive->%*) {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 6d53f956..be3e64d6 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -45,9 +45,7 @@ sub add_service_usage {
my ($self, $sid, $sd) = @_;
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
- my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
- my ($current_node, $target_node) =
- get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
+ my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
$self->add_service_usage_to_node($current_node, $sid) if $current_node;
$self->add_service_usage_to_node($target_node, $sid) if $target_node;
@@ -66,11 +64,12 @@ sub score_nodes_to_start_service {
die "implement in subclass";
}
-# Returns the current and target node as a two-element array, that a service
-# puts load on according to the $online_nodes and the service's $state, $node
-# and $target.
+# Returns a two-element array of the nodes a service puts load on
+# (current and target), given $online_nodes and service data $sd.
sub get_used_service_nodes {
- my ($online_nodes, $state, $node, $target) = @_;
+ my ($online_nodes, $sd) = @_;
+
+ my ($state, $node, $target) = $sd->@{qw(state node target)};
return (undef, undef) if $state eq 'stopped' || $state eq 'request_start';
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (11 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 12/28] usage: pass service data to get_used_service_nodes Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes Daniel Kral
` (16 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
The running flag is needed to discriminate starting and started
resources from each other, which is a required parameter for using the
new add_service(...) method for the resource scheduling bindings.
The HA Manager tracks whether HA resources are in 'started' state and
whether the LRM acknowledged that these are running. For non-HA
resources, the rrd_dump data contains a running flag for VM and CT
guests.
See the next patch for the usage implementations, which passes the
running flag to the add_service(...) method, for more information about
the details.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Env/PVE2.pm | 1 +
src/PVE/HA/Manager.pm | 2 +-
src/PVE/HA/Sim/Hardware.pm | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index fc815fe0..3caf32fc 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -551,6 +551,7 @@ my sub get_cluster_service_stats {
id => $id,
node => $nodename,
state => $state,
+ running => $state eq 'started',
type => $type,
usage => {},
};
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 71f45b5c..5b2715c7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -292,7 +292,7 @@ sub recompute_online_node_usage {
next if $self->{ss}->{$sid};
# the migration target is not known for non-HA resources
- my $sd = { $service_stats->{$sid}->%{qw(node state)} };
+ my $sd = { $service_stats->{$sid}->%{qw(node state running)} };
$online_node_usage->add_service_usage($sid, $sd);
}
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 1959f5c9..82f85c97 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1201,6 +1201,7 @@ my sub get_cluster_service_stats {
$stats->{$sid} = {
node => $cfg->{node},
state => $cfg->{state},
+ running => $cfg->{state} eq 'started',
usage => {},
};
}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (12 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 13/28] add running flag to non-HA cluster service stats Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler Daniel Kral
` (15 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
The pve_static (and upcoming pve_dynamic) bindings expose the new
add_resource(...) method, which allow adding resources in a single call
with the additional running flag.
The running flag is needed to discriminate starting and started HA
resources from each other, which is needed to correctly account for HA
resources for the dynamic load usage implementation in the next patch.
This is because for the dynamic load usage, any HA resource, which is
scheduled to start by the HA Manager in the same round, will not be
accounted for in the next call to score_nodes_to_start_resource(...).
This is not a problem for the static load usage, because there the
current node usages are derived from the started resources on every
call already.
Passing only the HA resources' 'state' property is not enough since the
HA Manager will move any HA resource from the 'request_start' (or
through other transient states such as 'request_start_balance' and a
successful 'migrate'/'relocate') into the 'started' state.
This 'started' state is then picked up by the HA resource's LRM, which
will actually start the HA resource and if successful respond with a
'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
adding the running flag to the HA resource's state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Usage.pm | 13 ++++++++-----
src/PVE/HA/Usage/Basic.pm | 9 ++++++++-
src/PVE/HA/Usage/Static.pm | 30 ++++++++++++++++++++++++------
3 files changed, 40 insertions(+), 12 deletions(-)
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index be3e64d6..43feb041 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,9 +33,8 @@ sub contains_node {
die "implement in subclass";
}
-# Logs a warning to $haenv upon failure, but does not die.
-sub add_service_usage_to_node {
- my ($self, $nodename, $sid) = @_;
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
die "implement in subclass";
}
@@ -47,8 +46,12 @@ sub add_service_usage {
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
- $self->add_service_usage_to_node($current_node, $sid) if $current_node;
- $self->add_service_usage_to_node($target_node, $sid) if $target_node;
+ # some usage implementations need to discern whether a service is truly running;
+ # a service does only have the 'running' flag in 'started' state
+ my $running = ($sd->{state} eq 'started' && $sd->{running})
+ || ($sd->{state} ne 'started' && defined($current_node));
+
+ $self->add_service($sid, $current_node, $target_node, $running);
}
sub remove_service_usage {
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 2584727b..5aa3ac05 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -38,7 +38,7 @@ sub contains_node {
return defined($self->{nodes}->{$nodename});
}
-sub add_service_usage_to_node {
+my sub add_service_usage_to_node {
my ($self, $nodename, $sid) = @_;
if ($self->contains_node($nodename)) {
@@ -51,6 +51,13 @@ sub add_service_usage_to_node {
}
}
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+ add_service_usage_to_node($self, $current_node, $sid) if defined($current_node);
+ add_service_usage_to_node($self, $target_node, $sid) if defined($target_node);
+}
+
sub remove_service_usage {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index b60f5000..8c7a614b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -71,17 +71,35 @@ my sub get_service_usage {
return $service_stats;
}
-sub add_service_usage_to_node {
- my ($self, $nodename, $sid) = @_;
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
- $self->{'node-services'}->{$nodename}->{$sid} = 1;
+ # do not add service which do not put any usage on the nodes
+ return if !defined($current_node) && !defined($target_node);
+
+ # PVE::RS::ResourceScheduling::Static::add_service() expects $current_node
+ # to be set, so consider $target_node as $current_node for unset $current_node;
+ #
+ # currently, this happens for the request_start_balance service state and if
+ # node maintenance causes services to migrate to other nodes
+ if (!defined($current_node)) {
+ $current_node = $target_node;
+ undef $target_node;
+ }
eval {
my $service_usage = get_service_usage($self, $sid);
- $self->{scheduler}->add_service_usage_to_node($nodename, $sid, $service_usage);
+
+ my $service = {
+ stats => $service_usage,
+ running => $running,
+ 'current-node' => $current_node,
+ 'target-node' => $target_node,
+ };
+
+ $self->{scheduler}->add_service($sid, $service);
};
- $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
- if $@;
+ $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
}
sub remove_service_usage {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (13 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 14/28] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases Daniel Kral
` (14 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
The dynamic usage scheduler allows the HA Manager to make scheduling
decisions based on the current usage of the nodes and cluster resources
in addition to the maximum usage stats as reported by the PVE::HA::Env
implementation.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Env.pm | 12 ++++
src/PVE/HA/Manager.pm | 21 ++++++
src/PVE/HA/Usage/Dynamic.pm | 122 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Usage/Makefile | 2 +-
5 files changed, 157 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Usage/Dynamic.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 38d5d60b..75220a0b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -42,6 +42,7 @@
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
/usr/share/perl5/PVE/HA/Usage/Static.pm
+/usr/share/perl5/PVE/HA/Usage/Dynamic.pm
/usr/share/perl5/PVE/Service/pve_ha_crm.pm
/usr/share/perl5/PVE/Service/pve_ha_lrm.pm
/usr/share/pve-manager/templates/default/fencing-body.html.hbs
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 3643292e..44c26854 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -312,12 +312,24 @@ sub get_static_service_stats {
return $self->{plug}->get_static_service_stats();
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ return $self->{plug}->get_dynamic_service_stats();
+}
+
sub get_static_node_stats {
my ($self) = @_;
return $self->{plug}->get_static_node_stats();
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ return $self->{plug}->get_dynamic_node_stats();
+}
+
sub get_node_version {
my ($self, $node) = @_;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 5b2715c7..c60ab595 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -21,6 +21,12 @@ eval {
$have_static_scheduling = 1;
};
+my $have_dynamic_scheduling;
+eval {
+ require PVE::HA::Usage::Dynamic;
+ $have_dynamic_scheduling = 1;
+};
+
## Variable Name & Abbreviations Convention
#
# The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -267,6 +273,21 @@ sub recompute_online_node_usage {
'warning',
"fallback to 'basic' scheduler mode, init for 'static' failed - $@",
) if $@;
+ } elsif ($mode eq 'dynamic') {
+ if ($have_dynamic_scheduling) {
+ $online_node_usage = eval {
+ $service_stats = $haenv->get_dynamic_service_stats();
+ my $scheduler = PVE::HA::Usage::Dynamic->new($haenv, $service_stats);
+ $scheduler->add_node($_) for $online_nodes->@*;
+ return $scheduler;
+ };
+ } else {
+ $@ = "dynamic scheduling not available\n";
+ }
+ $haenv->log(
+ 'warning',
+ "fallback to 'basic' scheduler mode, init for 'dynamic' failed - $@",
+ ) if $@;
} elsif ($mode eq 'basic') {
# handled below in the general fall-back case
} else {
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
new file mode 100644
index 00000000..24c85a41
--- /dev/null
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -0,0 +1,122 @@
+package PVE::HA::Usage::Dynamic;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Dynamic;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+ my ($class, $haenv, $service_stats) = @_;
+
+ my $node_stats = eval { $haenv->get_dynamic_node_stats() };
+ die "did not get dynamic node usage information - $@" if $@;
+
+ my $scheduler = eval { PVE::RS::ResourceScheduling::Dynamic->new() };
+ die "unable to initialize dynamic scheduling - $@" if $@;
+
+ return bless {
+ 'node-stats' => $node_stats,
+ 'service-stats' => $service_stats,
+ haenv => $haenv,
+ scheduler => $scheduler,
+ }, $class;
+}
+
+sub add_node {
+ my ($self, $nodename) = @_;
+
+ my $stats = $self->{'node-stats'}->{$nodename}
+ or die "did not get dynamic node usage information for '$nodename'\n";
+ die "dynamic node usage information for '$nodename' missing cpu count\n"
+ if !defined($stats->{maxcpu});
+ die "dynamic node usage information for '$nodename' missing memory\n"
+ if !defined($stats->{maxmem});
+
+ eval { $self->{scheduler}->add_node($nodename, $stats); };
+ die "initializing dynamic node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+ my ($self, $nodename) = @_;
+
+ $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+ my ($self) = @_;
+
+ return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+ my ($self, $nodename) = @_;
+
+ return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+ my ($self, $sid) = @_;
+
+ my $service_stats = $self->{'service-stats'}->{$sid}->{usage}
+ or die "did not get dynamic service usage information for '$sid'\n";
+
+ return $service_stats;
+}
+
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+ # do not add service, which does not put any usage on the nodes
+ return if !defined($current_node) && !defined($target_node);
+
+ # PVE::RS::ResourceScheduling::Dynamic::add_resource() expects $current_node
+ # to be set, so consider $target_node as $current_node for unset $current_node;
+ #
+ # currently, this happens for the request_start_balance service state and if
+ # node maintenance causes services to migrate to other nodes
+ if (!defined($current_node)) {
+ $current_node = $target_node;
+ undef $target_node;
+ }
+
+ eval {
+ my $service_usage = get_service_usage($self, $sid);
+
+ my $service = {
+ stats => $service_usage,
+ running => $running,
+ 'current-node' => $current_node,
+ 'target-node' => $target_node,
+ };
+
+ $self->{scheduler}->add_resource($sid, $service);
+ };
+ $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
+}
+
+sub remove_service_usage {
+ my ($self, $sid) = @_;
+
+ eval { $self->{scheduler}->remove_resource($sid) };
+ $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
+}
+
+sub score_nodes_to_start_service {
+ my ($self, $sid) = @_;
+
+ my $score_list = eval {
+ my $service_usage = get_service_usage($self, $sid);
+ $self->{scheduler}->score_nodes_to_start_resource($service_usage);
+ };
+ $self->{haenv}
+ ->log('err', "unable to score nodes according to dynamic usage for service '$sid' - $@")
+ if $@;
+
+ # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+ return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index befdda60..5d51a9c1 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,5 +1,5 @@
SIM_SOURCES=Basic.pm
-SOURCES=${SIM_SOURCES} Static.pm
+SOURCES=${SIM_SOURCES} Static.pm Dynamic.pm
.PHONY: install
install:
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (14 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 15/28] usage: add dynamic usage scheduler Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion Daniel Kral
` (13 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases document the basic behavior of the scheduler using the
dynamic usage information of the HA resources with rebalance-on-start
being cleared and set respectively.
As the mechanisms for the scheduler with static and dynamic usage
information are mostly the same, these test cases verify only the
essential parts, which are:
- dynamic usage information is used correctly (for both test cases), and
- repeatedly scheduling resources with score_nodes_to_start_service(...)
correctly simulates that the previously scheduled HA resources are
already started
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/test/test-crs-dynamic-rebalance1/README | 3 +
src/test/test-crs-dynamic-rebalance1/cmdlist | 4 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 7 ++
.../hardware_status | 5 ++
.../test-crs-dynamic-rebalance1/log.expect | 82 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 7 ++
.../static_service_stats | 7 ++
src/test/test-crs-dynamic1/README | 4 +
src/test/test-crs-dynamic1/cmdlist | 4 +
src/test/test-crs-dynamic1/datacenter.cfg | 6 ++
.../test-crs-dynamic1/dynamic_service_stats | 3 +
src/test/test-crs-dynamic1/hardware_status | 5 ++
src/test/test-crs-dynamic1/log.expect | 51 ++++++++++++
src/test/test-crs-dynamic1/manager_status | 1 +
src/test/test-crs-dynamic1/service_config | 3 +
.../test-crs-dynamic1/static_service_stats | 3 +
18 files changed, 203 insertions(+)
create mode 100644 src/test/test-crs-dynamic-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic1/README
create mode 100644 src/test/test-crs-dynamic1/cmdlist
create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic1/hardware_status
create mode 100644 src/test/test-crs-dynamic1/log.expect
create mode 100644 src/test/test-crs-dynamic1/manager_status
create mode 100644 src/test/test-crs-dynamic1/service_config
create mode 100644 src/test/test-crs-dynamic1/static_service_stats
diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README
new file mode 100644
index 00000000..df0ba0a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/README
@@ -0,0 +1,3 @@
+Test rebalancing on start and how after a failed node the recovery gets
+balanced out for a small batch of HA resources with the dynamic usage
+information.
diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..0f76d24e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-rebalance-on-start": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..5ef75ae0
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "cpu": 1.3, "mem": 1073741824 },
+ "vm:102": { "cpu": 5.6, "mem": 3221225472 },
+ "vm:103": { "cpu": 0.5, "mem": 4000000000 },
+ "vm:104": { "cpu": 7.9, "mem": 2147483648 },
+ "vm:105": { "cpu": 3.2, "mem": 2684354560 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status
new file mode 100644
index 00000000..bfdbbf7b
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect
new file mode 100644
index 00000000..5c8b050c
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/log.expect
@@ -0,0 +1,82 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: service vm:101: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:102: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:103: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service vm:104: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:105: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:101 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:102 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:102 - end relocate to node 'node2'
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 25 node3/lrm: service vm:104 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:104 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:105 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:105 - end relocate to node 'node2'
+info 40 node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node2)
+info 40 node1/crm: service 'vm:104': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:105': state changed from 'request_start_balance' to 'started' (node = node2)
+info 41 node1/lrm: starting service vm:101
+info 41 node1/lrm: service status vm:101 started
+info 41 node1/lrm: starting service vm:104
+info 41 node1/lrm: service status vm:104 started
+info 43 node2/lrm: starting service vm:102
+info 43 node2/lrm: service status vm:102 started
+info 43 node2/lrm: starting service vm:105
+info 43 node2/lrm: service status vm:105 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:103
+info 241 node1/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config
new file mode 100644
index 00000000..3071f480
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/service_config
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats
new file mode 100644
index 00000000..a9e810d7
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:102": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:103": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:104": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:105": { "maxcpu": 8, "maxmem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README
new file mode 100644
index 00000000..e6382130
--- /dev/null
+++ b/src/test/test-crs-dynamic1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with dynamic usage information.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist
new file mode 100644
index 00000000..8684073c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg
new file mode 100644
index 00000000..6a7fbc48
--- /dev/null
+++ b/src/test/test-crs-dynamic1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "dynamic"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats
new file mode 100644
index 00000000..922ae9a6
--- /dev/null
+++ b/src/test/test-crs-dynamic1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "cpu": 5.9, "mem": 2744123392 }
+}
diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status
new file mode 100644
index 00000000..bbe44a96
--- /dev/null
+++ b/src/test/test-crs-dynamic1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 }
+}
diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect
new file mode 100644
index 00000000..b7e298e1
--- /dev/null
+++ b/src/test/test-crs-dynamic1/log.expect
@@ -0,0 +1,51 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node1 off
+info 120 node1/crm: status change master => lost_manager_lock
+info 120 node1/crm: status change lost_manager_lock => wait_for_quorum
+info 121 node1/lrm: status change active => lost_agent_lock
+info 162 watchdog: execute power node1 off
+info 161 node1/crm: killed by poweroff
+info 162 node1/lrm: killed by poweroff
+info 162 hardware: server 'node1' stopped by poweroff (watchdog)
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'dynamic'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info 282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3)
+info 283 node3/lrm: got lock 'ha_agent_node3_lock'
+info 283 node3/lrm: status change wait_for_agent_lock => active
+info 283 node3/lrm: starting service vm:102
+info 283 node3/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config
new file mode 100644
index 00000000..9c124471
--- /dev/null
+++ b/src/test/test-crs-dynamic1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats
new file mode 100644
index 00000000..1819d24c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (15 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 16/28] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
` (12 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
The name is misleading, because the HA resource migration is not
executed, but only queues the HA resource to change into the state
'migrate' or 'relocate', which is then picked up by the respective LRM
to execute.
The term 'resource motion' also generalizes the different actions
implied by the 'migrate' and 'relocate' command and state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Manager.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c60ab595..c8a1a35b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -419,7 +419,7 @@ sub read_lrm_status {
return ($results, $modes);
}
-sub execute_migration {
+sub queue_resource_motion {
my ($self, $cmd, $task, $sid, $target) = @_;
my ($haenv, $ss) = $self->@{qw(haenv ss)};
@@ -488,7 +488,7 @@ sub update_crm_commands {
"ignore crm command - service already on target node: $cmd",
);
} else {
- $self->execute_migration($cmd, $task, $sid, $node);
+ $self->queue_resource_motion($cmd, $task, $sid, $node);
}
}
} else {
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (16 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 17/28] manager: rename execute_migration to queue_resource_motion Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
` (11 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/PVE/HA/Manager.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c8a1a35b..2576c762 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -94,11 +94,12 @@ sub update_crs_scheduler_mode {
my $haenv = $self->{haenv};
my $dc_cfg = $haenv->get_datacenter_settings();
+ my $crs_cfg = $dc_cfg->{crs};
- $self->{crs}->{rebalance_on_request_start} = !!$dc_cfg->{crs}->{'ha-rebalance-on-start'};
+ $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
my $old_mode = $self->{crs}->{scheduler};
- my $new_mode = $dc_cfg->{crs}->{ha} || 'basic';
+ my $new_mode = $crs_cfg->{ha} || 'basic';
if (!defined($old_mode)) {
$haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 19/28] implement automatic rebalancing
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (17 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 18/28] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:14 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases Daniel Kral
` (10 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
If the automatic load balancing system is enabled, it checks whether the
cluster node imbalance exceeds some user-defined threshold for some HA
Manager rounds ("hold duration"). If it does exceed on consecutive HA
Manager rounds, it will choose the best resource motion to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined imbalance improvement ("margin").
This patch introduces resource bundles, which ensure that HA resources
in strict positive resource affinity rules are considered as a whole
"bundle" instead of individual HA resources.
Specifically, active and stationary resource bundles are resource
bundles, that have at least one resource running and all resources
located on the same node. This distinction is needed as newly created
strict positive resource affinity rules may still require some resource
motions to enforce the rule.
Additionally, the migration candidate generation prunes any target
nodes, which do not adhere to the HA rules of these resource bundles
before scoring these migration candidates.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- change imbalance threshold default value from 0.7 to 0.3
- use sprintf() for float number printing instead of perly rounding
logic
- print before and expected after values
- implement PVE::HA::Usage::Basic rebalancing methods as well with
sensible return values, but which are only used to not throw errors if
a failback from 'dynamic'/'static' to 'basic' happens in
recompute_online_node_usage()
src/PVE/HA/Manager.pm | 178 +++++++++++++++++++++++++++++++++++-
src/PVE/HA/Usage.pm | 34 +++++++
src/PVE/HA/Usage/Basic.pm | 18 ++++
src/PVE/HA/Usage/Dynamic.pm | 33 +++++++
src/PVE/HA/Usage/Static.pm | 33 +++++++
5 files changed, 295 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 2576c762..b69a6bba 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -59,10 +59,17 @@ sub new {
my $self = bless {
haenv => $haenv,
- crs => {},
+ crs => {
+ auto_rebalance => {},
+ },
last_rules_digest => '',
last_groups_digest => '',
last_services_digest => '',
+ # used to track how many HA rounds the imbalance threshold has been exceeded
+ #
+ # this is not persisted for a CRM failover as in the mean time
+ # the usage statistics might have change quite a bit already
+ sustained_imbalance_round => 0,
group_migration_round => 3, # wait a little bit
}, $class;
@@ -97,6 +104,13 @@ sub update_crs_scheduler_mode {
my $crs_cfg = $dc_cfg->{crs};
$self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
+ $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
+ $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.3;
+ $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
+ // 'bruteforce';
+ $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
+ // 3;
+ $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
my $old_mode = $self->{crs}->{scheduler};
my $new_mode = $crs_cfg->{ha} || 'basic';
@@ -114,6 +128,149 @@ sub update_crs_scheduler_mode {
return;
}
+# Returns a hash of lists, which contain the running, non-moving HA resource
+# bundles, which are on the same node, implied by the strict positive resource
+# affinity rules.
+#
+# Each resource bundle has a leader, which is the alphabetically first running
+# HA resource in the resource bundle and also the key of each resource bundle
+# in the returned hash.
+sub get_active_stationary_resource_bundles {
+ my ($ss, $resource_affinity) = @_;
+
+ my $resource_bundles = {};
+OUTER: for my $sid (sort keys %$ss) {
+ # do not consider non-started resource as 'active' leading resource
+ next if $ss->{$sid}->{state} ne 'started';
+
+ my @resources = ($sid);
+ my $nodes = { $ss->{$sid}->{node} => 1 };
+
+ my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
+ if (%$dependent_resources) {
+ for my $csid (keys %$dependent_resources) {
+ next if !defined($ss->{$csid});
+ my ($state, $node) = $ss->{$csid}->@{qw(state node)};
+
+ # do not consider stationary bundle if a dependent resource moves
+ next OUTER if $state eq 'migrate' || $state eq 'relocate';
+ # do not add non-started resource to active bundle
+ next if $state ne 'started';
+
+ $nodes->{$node} = 1;
+
+ push @resources, $csid;
+ }
+
+ @resources = sort @resources;
+ }
+
+ # skip resource bundles, which are not on the same node yet
+ next if keys %$nodes > 1;
+
+ my $leader_sid = $resources[0];
+
+ $resource_bundles->{$leader_sid} = \@resources;
+ }
+
+ return $resource_bundles;
+}
+
+# Returns a hash of hashes, where each item contains the resource bundle's
+# leader, the list of HA resources in the resource bundle, and the list of
+# possible nodes to migrate to.
+sub get_resource_migration_candidates {
+ my ($self) = @_;
+
+ my ($ss, $compiled_rules, $online_node_usage) =
+ $self->@{qw(ss compiled_rules online_node_usage)};
+ my ($node_affinity, $resource_affinity) =
+ $compiled_rules->@{qw(node-affinity resource-affinity)};
+
+ my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+ my @compact_migration_candidates = ();
+ for my $leader_sid (sort keys %$resource_bundles) {
+ my $current_leader_node = $ss->{$leader_sid}->{node};
+ my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
+
+ my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+ my ($together, $separate) =
+ get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
+ apply_negative_resource_affinity($separate, $target_nodes);
+
+ delete $target_nodes->{$current_leader_node};
+
+ next if !%$target_nodes;
+
+ push @compact_migration_candidates,
+ {
+ leader => $leader_sid,
+ nodes => [sort keys %$target_nodes],
+ resources => $resource_bundles->{$leader_sid},
+ };
+ }
+
+ return \@compact_migration_candidates;
+}
+
+sub load_balance {
+ my ($self) = @_;
+
+ my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
+ my ($auto_rebalance_opts) = $crs->{auto_rebalance};
+
+ return if !$auto_rebalance_opts->{enable};
+ return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
+ return if $self->any_resource_motion_queued_or_running();
+
+ my ($threshold, $method, $hold_duration, $margin) =
+ $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
+
+ my $imbalance = $online_node_usage->calculate_node_imbalance();
+
+ # do not load balance unless imbalance threshold has been exceeded
+ # consecutively for $hold_duration calls to load_balance();
+ # the <= relation prevents load balancing from triggering for $imbalance = 0.0
+ if ($imbalance <= $threshold) {
+ $self->{sustained_imbalance_round} = 0;
+ return;
+ } else {
+ $self->{sustained_imbalance_round}++;
+ return if $self->{sustained_imbalance_round} < $hold_duration;
+ $self->{sustained_imbalance_round} = 0;
+ }
+
+ my $candidates = $self->get_resource_migration_candidates();
+
+ my $result;
+ if ($method eq 'bruteforce') {
+ $result = $online_node_usage->select_best_balancing_migration($candidates);
+ } elsif ($method eq 'topsis') {
+ $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
+ }
+
+ # happens if $candidates is empty or $method isn't handled above
+ return if !$result;
+
+ my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
+
+ my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
+ return if $relative_change < $margin;
+
+ my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
+
+ my (undef, $type, $id) = $haenv->parse_sid($sid);
+ my $task = $type eq 'vm' ? "migrate" : "relocate";
+ my $cmd = "$task $sid $target";
+
+ my $imbalance_change_str =
+ sprintf("expected change for imbalance from %.2f to %.2f", $imbalance, $target_imbalance);
+ $haenv->log('info', "auto rebalance - $task $sid to $target ($imbalance_change_str)");
+
+ $self->queue_resource_motion($cmd, $task, $sid, $target);
+}
+
sub cleanup {
my ($self) = @_;
@@ -466,6 +623,21 @@ sub queue_resource_motion {
}
}
+sub any_resource_motion_queued_or_running {
+ my ($self) = @_;
+
+ my ($ss) = $self->@{qw(ss)};
+
+ for my $sid (keys %$ss) {
+ my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
+
+ return 1 if $state eq 'migrate' || $state eq 'relocate';
+ return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
+ }
+
+ return 0;
+}
+
# read new crm commands and save them into crm master status
sub update_crm_commands {
my ($self) = @_;
@@ -902,6 +1074,10 @@ sub manage {
return; # disarm active and progressing, skip normal service state machine
}
# disarm deferred - fall through but only process services in transient states
+ } else {
+ # load balance only if disarm is disabled as during a deferred disarm
+ # the HA Manager should not introduce any new migrations
+ $self->load_balance();
}
$self->{all_lrms_disarmed} = 0;
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 43feb041..659ab30a 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -60,6 +60,40 @@ sub remove_service_usage {
die "implement in subclass";
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ die "implement in subclass";
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ die "implement in subclass";
+}
+
+sub select_best_balancing_migration {
+ my ($self, $migration_candidates) = @_;
+
+ my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
+
+ return $migrations->[0];
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ die "implement in subclass";
+}
+
+sub select_best_balancing_migration_topsis {
+ my ($self, $migration_candidates) = @_;
+
+ my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
+
+ return $migrations->[0];
+}
+
# Returns a hash with $nodename => $score pairs. A lower $score is better.
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 5aa3ac05..4dce9e17 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -66,6 +66,24 @@ sub remove_service_usage {
}
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ return 0.0;
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ return [];
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ return [];
+}
+
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
index 24c85a41..76d0feaa 100644
--- a/src/PVE/HA/Usage/Dynamic.pm
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -104,6 +104,39 @@ sub remove_service_usage {
$self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+ $self->{haenv}->log('warning', "unable to calculate dynamic node imbalance - $@") if $@;
+
+ return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 8c7a614b..e67d5f5b 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -111,6 +111,39 @@ sub remove_service_usage {
$self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+ $self->{haenv}->log('warning', "unable to calculate static node imbalance - $@") if $@;
+
+ return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH ha-manager v4 19/28] implement automatic rebalancing
2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
@ 2026-04-02 13:14 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:14 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> If the automatic load balancing system is enabled, it checks whether the
> cluster node imbalance exceeds some user-defined threshold for some HA
> Manager rounds ("hold duration"). If it does exceed on consecutive HA
> Manager rounds, it will choose the best resource motion to improve the
> cluster node imbalance and queue it if it significantly improves it by
> some user-defined imbalance improvement ("margin").
>
> This patch introduces resource bundles, which ensure that HA resources
> in strict positive resource affinity rules are considered as a whole
> "bundle" instead of individual HA resources.
>
> Specifically, active and stationary resource bundles are resource
> bundles, that have at least one resource running and all resources
> located on the same node. This distinction is needed as newly created
> strict positive resource affinity rules may still require some resource
> motions to enforce the rule.
>
> Additionally, the migration candidate generation prunes any target
> nodes, which do not adhere to the HA rules of these resource bundles
> before scoring these migration candidates.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - change imbalance threshold default value from 0.7 to 0.3
> - use sprintf() for float number printing instead of perly rounding
> logic
> - print before and expected after values
> - implement PVE::HA::Usage::Basic rebalancing methods as well with
> sensible return values, but which are only used to not throw errors if
> a failback from 'dynamic'/'static' to 'basic' happens in
> recompute_online_node_usage()
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (18 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 19/28] implement automatic rebalancing Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
` (9 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases document which resource bundles count as active and
stationary and ensure that get_active_stationary_resource_bundles(...)
does produce the correct active, stationary resource bundles.
This is especially important, because these resource bundles are used
for the load balancing candidate generation, which is passed to
score_best_balancing_migration_candidates($candidates, ...). The
PVE::HA::Usage::{Static,Dynamic} implementation validates these
candidates and fails with an user-visible error message.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
src/test/Makefile | 1 +
src/test/test_resource_bundles.pl | 234 ++++++++++++++++++++++++++++++
2 files changed, 235 insertions(+)
create mode 100755 src/test/test_resource_bundles.pl
diff --git a/src/test/Makefile b/src/test/Makefile
index 6da9e100..f72b755b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -6,6 +6,7 @@ test:
@echo "-- start regression tests --"
./test_failover1.pl
./test_rules_config.pl
+ ./test_resource_bundles.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
diff --git a/src/test/test_resource_bundles.pl b/src/test/test_resource_bundles.pl
new file mode 100755
index 00000000..d38dc516
--- /dev/null
+++ b/src/test/test_resource_bundles.pl
@@ -0,0 +1,234 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use Test::More;
+
+use PVE::HA::Manager;
+
+my $get_active_stationary_resource_bundle_tests = [
+ {
+ description => "trivial resource bundles",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {},
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101',
+ ],
+ 'vm:102' => [
+ 'vm:102',
+ ],
+ },
+ },
+ {
+ description => "simple resource bundle",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101', 'vm:102',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with first resource stopped",
+ services => {
+ 'vm:101' => {
+ state => 'stopped',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:102' => [
+ 'vm:102', 'vm:103',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with some stopped resources",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'stopped',
+ node => 'node1',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101', 'vm:103',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with moving resources",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'migrate',
+ node => 'node2',
+ target => 'node1',
+ },
+ 'vm:103' => {
+ state => 'relocate',
+ node => 'node3',
+ target => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {},
+ },
+ # might happen if the resource bundle is generated even before the HA Manager
+ # puts the HA resources in migrate/relocate to make them adhere to the HA rules
+ {
+ description => "resource bundle with resources on different nodes",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node2',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node3',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {},
+ },
+];
+
+my $tests = [
+ @$get_active_stationary_resource_bundle_tests,
+];
+
+plan(tests => scalar($tests->@*));
+
+for my $case ($get_active_stationary_resource_bundle_tests->@*) {
+ my ($ss, $resource_affinity) = $case->@{qw(services resource_affinity)};
+
+ my $result = PVE::HA::Manager::get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+ is_deeply($result, $case->{resource_bundles}, $case->{description});
+}
+
+done_testing();
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system test cases
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (19 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 20/28] test: add resource bundle generation test cases Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:21 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
` (8 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases document the basic behavior of the automatic load
rebalancer using the dynamic usage stats.
As an overview:
- Case 0: rebalancing system is inactive for no configured HA resources
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance and converge if
the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance through dynamic
changes in their usage
- Case 4: rebalancing system doesn't trigger a migration if the node
imbalance is exceeded once but isn't sustained for at least
the set hold duration
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 was changed to have an initial imbalance below 0.3 instead of
an imbalance below 0.7 as before
- case 4 was changed to have an imbalance below 0.3 instead of below 0.7
to not trigger a rebalancing migration inbetween the transient spike
.../test-crs-dynamic-auto-rebalance0/README | 2 +
.../test-crs-dynamic-auto-rebalance0/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 1 +
.../hardware_status | 5 ++
.../log.expect | 11 +++
.../manager_status | 1 +
.../service_config | 1 +
.../static_service_stats | 1 +
.../test-crs-dynamic-auto-rebalance1/README | 7 ++
.../test-crs-dynamic-auto-rebalance1/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 3 +
.../hardware_status | 5 ++
.../log.expect | 25 ++++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../test-crs-dynamic-auto-rebalance2/README | 4 +
.../test-crs-dynamic-auto-rebalance2/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../test-crs-dynamic-auto-rebalance3/README | 4 +
.../test-crs-dynamic-auto-rebalance3/cmdlist | 16 ++++
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 9 ++
.../hardware_status | 5 ++
.../log.expect | 89 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 ++
.../static_service_stats | 9 ++
.../test-crs-dynamic-auto-rebalance4/README | 11 +++
.../test-crs-dynamic-auto-rebalance4/cmdlist | 13 +++
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 9 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++
.../manager_status | 1 +
.../service_config | 9 ++
.../static_service_stats | 9 ++
45 files changed, 459 insertions(+)
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/README b/src/test/test-crs-dynamic-auto-rebalance0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/log.expect b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
@@ -0,0 +1,11 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/manager_status b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/service_config b/src/test/test-crs-dynamic-auto-rebalance0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/README b/src/test/test-crs-dynamic-auto-rebalance1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/service_config b/src/test/test-crs-dynamic-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/README b/src/test/test-crs-dynamic-auto-rebalance2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
new file mode 100644
index 00000000..3d79026e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/service_config b/src/test/test-crs-dynamic-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/README b/src/test/test-crs-dynamic-auto-rebalance3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
@@ -0,0 +1,16 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:101 set-dynamic-stats mem 1011",
+ "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+ "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+ "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+ "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+ "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 805306368 },
+ "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+ "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
new file mode 100644
index 00000000..275f7aec
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
@@ -0,0 +1,89 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 160 node1/crm: auto rebalance - migrate vm:105 to node2 (expected change for imbalance from 0.85 to 0.42)
+info 160 node1/crm: got crm command: migrate vm:105 node2
+info 160 node1/crm: migrate service 'vm:105' to node 'node2'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:105
+info 183 node2/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info 220 cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info 220 cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info 240 node1/crm: auto rebalance - migrate vm:103 to node1 (expected change for imbalance from 0.81 to 0.40)
+info 240 node1/crm: got crm command: migrate vm:103 node1
+info 240 node1/crm: migrate service 'vm:103' to node 'node1'
+info 240 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 243 node2/lrm: service vm:103 - start migrate to node 'node1'
+info 243 node2/lrm: service vm:103 - end migrate to node 'node1'
+info 260 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 261 node1/lrm: starting service vm:103
+info 261 node1/lrm: service status vm:103 started
+info 320 node1/crm: auto rebalance - migrate vm:105 to node3 (expected change for imbalance from 0.40 to 0.21)
+info 320 node1/crm: got crm command: migrate vm:105 node3
+info 320 node1/crm: migrate service 'vm:105' to node 'node3'
+info 320 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 323 node2/lrm: service vm:105 - start migrate to node 'node3'
+info 323 node2/lrm: service vm:105 - end migrate to node 'node3'
+info 340 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node3)
+info 345 node3/lrm: starting service vm:105
+info 345 node3/lrm: service status vm:105 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/service_config b/src/test/test-crs-dynamic-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/README b/src/test/test-crs-dynamic-auto-rebalance4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..0b1d7625
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
@@ -0,0 +1,13 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:105 set-dynamic-stats cpu 3.0 mem 768",
+ "service vm:106 set-dynamic-stats cpu 2.9 mem 1538",
+ "service vm:107 set-dynamic-stats cpu 2.1 mem 1538"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..14059a3e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-hold-duration": 6
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 805306368 },
+ "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+ "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
new file mode 100644
index 00000000..30898f18
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 768
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 1538
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 1538
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/service_config b/src/test/test-crs-dynamic-auto-rebalance4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system test cases
2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-04-02 13:21 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:21 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases document the basic behavior of the automatic load
> rebalancer using the dynamic usage stats.
>
> As an overview:
>
> - Case 0: rebalancing system is inactive for no configured HA resources
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
> for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
> resources cause a significant node imbalance and converge if
> the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
> resources cause a significant node imbalance through dynamic
> changes in their usage
> - Case 4: rebalancing system doesn't trigger a migration if the node
> imbalance is exceeded once but isn't sustained for at least
> the set hold duration
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 was changed to have an initial imbalance below 0.3 instead of
> an imbalance below 0.7 as before
> - case 4 was changed to have an imbalance below 0.3 instead of below 0.7
> to not trigger a rebalancing migration inbetween the transient spike
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH ha-manager v4 22/28] test: add static automatic rebalancing system test cases
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (20 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 21/28] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:23 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
` (7 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases are derivatives of the dynamic automatic rebalancing
system test cases 1 to 3, which ensure that the same basic functionality
is provided with the automatic rebalancing system with static usage
information.
The other dynamic usage test cases are not included here, because these
are invariant to the provided usage information and only test further
edge cases.
As an overview:
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance and converge if
the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance through changes
in their static usage
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 is a little more thorough now as it now uses 3 rebalancing
migrations to reach the minimum possible node imbalance
.../test-crs-static-auto-rebalance1/README | 7 ++
.../test-crs-static-auto-rebalance1/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../hardware_status | 5 +
.../log.expect | 25 +++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../test-crs-static-auto-rebalance2/README | 4 +
.../test-crs-static-auto-rebalance2/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../hardware_status | 5 +
.../log.expect | 59 +++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../test-crs-static-auto-rebalance3/README | 3 +
.../test-crs-static-auto-rebalance3/cmdlist | 15 +++
.../datacenter.cfg | 7 ++
.../hardware_status | 5 +
.../log.expect | 97 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 ++
.../static_service_stats | 9 ++
24 files changed, 291 insertions(+)
create mode 100644 src/test/test-crs-static-auto-rebalance1/README
create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-static-auto-rebalance2/README
create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-static-auto-rebalance3/README
create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats
diff --git a/src/test/test-crs-static-auto-rebalance1/README b/src/test/test-crs-static-auto-rebalance1/README
new file mode 100644
index 00000000..8f97ac55
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with static usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-static-auto-rebalance1/cmdlist b/src/test/test-crs-static-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance1/datacenter.cfg b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance1/hardware_status b/src/test/test-crs-static-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/log.expect b/src/test/test-crs-static-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d2c27bec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance1/manager_status b/src/test/test-crs-static-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance1/service_config b/src/test/test-crs-static-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/static_service_stats b/src/test/test-crs-static-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/README b/src/test/test-crs-static-auto-rebalance2/README
new file mode 100644
index 00000000..1d1b9d6e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-static-auto-rebalance2/cmdlist b/src/test/test-crs-static-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance2/datacenter.cfg b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance2/hardware_status b/src/test/test-crs-static-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/log.expect b/src/test/test-crs-static-auto-rebalance2/log.expect
new file mode 100644
index 00000000..6a2ab89f
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance2/manager_status b/src/test/test-crs-static-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-auto-rebalance2/service_config b/src/test/test-crs-static-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/static_service_stats b/src/test/test-crs-static-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/README b/src/test/test-crs-static-auto-rebalance3/README
new file mode 100644
index 00000000..2f57dac2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/README
@@ -0,0 +1,3 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running HA resources, where the static usage stats of some
+HA resources change over time, to reach minimum cluster node imbalance.
diff --git a/src/test/test-crs-static-auto-rebalance3/cmdlist b/src/test/test-crs-static-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..f18798b0
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/cmdlist
@@ -0,0 +1,15 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:106 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:107 set-static-stats maxcpu 8.0 maxmem 8192"
+ ],
+ [
+ "service vm:101 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:102 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:103 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:104 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:105 set-static-stats maxcpu 1.0 maxmem 1024"
+ ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance3/datacenter.cfg b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance3/hardware_status b/src/test/test-crs-static-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/log.expect b/src/test/test-crs-static-auto-rebalance3/log.expect
new file mode 100644
index 00000000..ecf2d183
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/log.expect
@@ -0,0 +1,97 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:106 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:107 set-static-stats maxcpu 8.0 maxmem 8192
+info 160 node1/crm: auto rebalance - migrate vm:105 to node1 (expected change for imbalance from 0.88 to 0.47)
+info 160 node1/crm: got crm command: migrate vm:105 node1
+info 160 node1/crm: migrate service 'vm:105' to node 'node1'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node1'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node1'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node1)
+info 181 node1/lrm: starting service vm:105
+info 181 node1/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:102 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:103 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:104 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:105 set-static-stats maxcpu 1.0 maxmem 1024
+info 240 node1/crm: auto rebalance - migrate vm:106 to node2 (expected change for imbalance from 0.91 to 0.42)
+info 240 node1/crm: got crm command: migrate vm:106 node2
+info 240 node1/crm: migrate service 'vm:106' to node 'node2'
+info 240 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 245 node3/lrm: service vm:106 - start migrate to node 'node2'
+info 245 node3/lrm: service vm:106 - end migrate to node 'node2'
+info 260 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node2)
+info 263 node2/lrm: starting service vm:106
+info 263 node2/lrm: service status vm:106 started
+info 320 node1/crm: auto rebalance - migrate vm:103 to node1 (expected change for imbalance from 0.42 to 0.31)
+info 320 node1/crm: got crm command: migrate vm:103 node1
+info 320 node1/crm: migrate service 'vm:103' to node 'node1'
+info 320 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 323 node2/lrm: service vm:103 - start migrate to node 'node1'
+info 323 node2/lrm: service vm:103 - end migrate to node 'node1'
+info 340 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 341 node1/lrm: starting service vm:103
+info 341 node1/lrm: service status vm:103 started
+info 400 node1/crm: auto rebalance - migrate vm:104 to node1 (expected change for imbalance from 0.31 to 0.20)
+info 400 node1/crm: got crm command: migrate vm:104 node1
+info 400 node1/crm: migrate service 'vm:104' to node 'node1'
+info 400 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 403 node2/lrm: service vm:104 - start migrate to node 'node1'
+info 403 node2/lrm: service vm:104 - end migrate to node 'node1'
+info 420 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node1)
+info 421 node1/lrm: starting service vm:104
+info 421 node1/lrm: service status vm:104 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance3/manager_status b/src/test/test-crs-static-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance3/service_config b/src/test/test-crs-static-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/static_service_stats b/src/test/test-crs-static-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..560a6fe8
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:105": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:106": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:107": { "maxcpu": 2.0, "maxmem": 2147483648 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH ha-manager v4 22/28] test: add static automatic rebalancing system test cases
2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
@ 2026-04-02 13:23 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:23 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases are derivatives of the dynamic automatic rebalancing
> system test cases 1 to 3, which ensure that the same basic functionality
> is provided with the automatic rebalancing system with static usage
> information.
>
> The other dynamic usage test cases are not included here, because these
> are invariant to the provided usage information and only test further
> edge cases.
>
> As an overview:
>
> - Case 1: rebalancing system doesn't trigger any rebalancing migrations
> for a single, configured HA resource
> - Case 2: rebalancing system triggers migrations if the running HA
> resources cause a significant node imbalance and converge if
> the imbalance falls below the threshold
> - Case 3: rebalancing system triggers migrations if the running HA
> resources cause a significant node imbalance through changes
> in their static usage
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 is a little more thorough now as it now uses 3 rebalancing
> migrations to reach the minimum possible node imbalance
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (21 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 22/28] test: add static " Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:29 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
` (6 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases are clones of the dynamic automatic rebalancing system
test cases 0 through 4, which ensure that the same basic functionality
is provided with the automatic rebalancing system using the TOPSIS
method.
The expected outputs are exactly the same, but for test case 3, which
changes the second migration from
vm:103 to node1 with an expected target imbalance of 0.40
to
vm:103 to node3 with an expected target imbalance of 0.43.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- case 3 was changed to have an initial imbalance below 0.3 instead of
an imbalance below 0.7 as before
- case 4 was changed to have an imbalance below 0.3 instead of below 0.7
to not trigger a rebalancing migration inbetween the transient spike
.../README | 2 +
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 1 +
.../hardware_status | 5 ++
.../log.expect | 11 +++
.../manager_status | 1 +
.../service_config | 1 +
.../static_service_stats | 1 +
.../README | 7 ++
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 3 +
.../hardware_status | 5 ++
.../log.expect | 25 ++++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../README | 4 +
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../README | 4 +
.../cmdlist | 16 ++++
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 9 ++
.../hardware_status | 5 ++
.../log.expect | 89 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 ++
.../static_service_stats | 9 ++
.../README | 11 +++
.../cmdlist | 13 +++
.../datacenter.cfg | 9 ++
.../dynamic_service_stats | 9 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++
.../manager_status | 1 +
.../service_config | 9 ++
.../static_service_stats | 9 ++
45 files changed, 464 insertions(+)
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/README b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
@@ -0,0 +1,11 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/README b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/README b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
new file mode 100644
index 00000000..3d79026e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected change for imbalance from 0.94 to 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/README b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
@@ -0,0 +1,16 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:101 set-dynamic-stats mem 1011",
+ "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+ "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+ "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+ "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+ "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 805306368 },
+ "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+ "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
new file mode 100644
index 00000000..c9fc29e0
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
@@ -0,0 +1,89 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 160 node1/crm: auto rebalance - migrate vm:105 to node2 (expected change for imbalance from 0.85 to 0.42)
+info 160 node1/crm: got crm command: migrate vm:105 node2
+info 160 node1/crm: migrate service 'vm:105' to node 'node2'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:105
+info 183 node2/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info 220 cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info 220 cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info 240 node1/crm: auto rebalance - migrate vm:103 to node3 (expected change for imbalance from 0.81 to 0.43)
+info 240 node1/crm: got crm command: migrate vm:103 node3
+info 240 node1/crm: migrate service 'vm:103' to node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 243 node2/lrm: service vm:103 - start migrate to node 'node3'
+info 243 node2/lrm: service vm:103 - end migrate to node 'node3'
+info 260 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 265 node3/lrm: starting service vm:103
+info 265 node3/lrm: service status vm:103 started
+info 320 node1/crm: auto rebalance - migrate vm:105 to node1 (expected change for imbalance from 0.43 to 0.24)
+info 320 node1/crm: got crm command: migrate vm:105 node1
+info 320 node1/crm: migrate service 'vm:105' to node 'node1'
+info 320 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 323 node2/lrm: service vm:105 - start migrate to node 'node1'
+info 323 node2/lrm: service vm:105 - end migrate to node 'node1'
+info 340 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node1)
+info 341 node1/lrm: starting service vm:105
+info 341 node1/lrm: service status vm:105 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/README b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
new file mode 100644
index 00000000..0b1d7625
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
@@ -0,0 +1,13 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:105 set-dynamic-stats cpu 3.0 mem 768",
+ "service vm:106 set-dynamic-stats cpu 2.9 mem 1538",
+ "service vm:107 set-dynamic-stats cpu 2.1 mem 1538"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
new file mode 100644
index 00000000..0fb3fdc3
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis",
+ "ha-auto-rebalance-hold-duration": 6
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
new file mode 100644
index 00000000..16c174a5
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 805306368 },
+ "vm:106": { "cpu": 2.9, "mem": 1612709888 },
+ "vm:107": { "cpu": 2.1, "mem": 1612709888 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
new file mode 100644
index 00000000..30898f18
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 768
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 1538
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 1538
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method
2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-04-02 13:29 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:29 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> These test cases are clones of the dynamic automatic rebalancing system
> test cases 0 through 4, which ensure that the same basic functionality
> is provided with the automatic rebalancing system using the TOPSIS
> method.
>
> The expected outputs are exactly the same, but for test case 3, which
> changes the second migration from
>
> vm:103 to node1 with an expected target imbalance of 0.40
>
> to
>
> vm:103 to node3 with an expected target imbalance of 0.43.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - case 3 was changed to have an initial imbalance below 0.3 instead of
> an imbalance below 0.7 as before
> - case 4 was changed to have an imbalance below 0.3 instead of below 0.7
> to not trigger a rebalancing migration inbetween the transient spike
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (22 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 23/28] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
` (5 subsequent siblings)
29 siblings, 0 replies; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
These test cases document and verify some behaviors of the automatic
rebalancing system in combination with HA affinity rules.
All of these test cases use only the dynamic usage information and
bruteforce method as the waiting on ongoing migrations and candidate
generation are invariant to those parameters.
As an overview:
- Case 1: rebalancing system acknowledges node affinity rules
- Case 2: rebalancing system considers HA resources in strict positive
resource affinity rules as a single unit (a resource bundle)
and will not split them apart
- Case 3: rebalancing system will wait on the migration of a not-yet
enforced strict positive resource affinity rule, i.e., the
HA resources still need to migrate to their common node
- Case 4: rebalancing system will acknowledge strict negative resource
affinity rules, but will still try to minimize the node
imbalance as much as possible
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
---
changes v3 -> v4:
- none
.../README | 7 +++
.../cmdlist | 8 +++
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 5 ++
.../hardware_status | 5 ++
.../log.expect | 49 +++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 ++
.../service_config | 5 ++
.../static_service_stats | 5 ++
.../README | 12 ++++
.../cmdlist | 8 +++
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 4 ++
.../hardware_status | 5 ++
.../log.expect | 53 +++++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 4 ++
.../static_service_stats | 4 ++
.../README | 14 +++++
.../cmdlist | 3 +
.../datacenter.cfg | 8 +++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 +++++++++++++++++++
.../manager_status | 31 ++++++++++
.../rules_config | 3 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../README | 14 +++++
.../cmdlist | 3 +
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 7 +++
.../service_config | 6 ++
.../static_service_stats | 6 ++
40 files changed, 452 insertions(+)
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/README b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
new file mode 100644
index 00000000..8504755f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information will not
+auto rebalance running HA resources, which cause a node imbalance exceeding the
+threshold, because their HA node affinity rules require them to strictly be
+kept on specific nodes.
+
+As a sanity check, the added HA resource, which is not part of the node
+affinity rule, is rebalanced to another node to lower the imbalance.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..6ee04948
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:104 add node1 started 1",
+ "service vm:104 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:104 set-dynamic-stats cpu 4.0 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..02133ab0
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+ "vm:103": { "cpu": 4.7, "mem": 5242880000 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
new file mode 100644
index 00000000..c9267997
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
@@ -0,0 +1,49 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:104 add node1 started 1
+info 120 cmdlist: execute service vm:104 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:104 set-dynamic-stats cpu 4.0 mem 4096
+info 120 node1/crm: adding new service 'vm:104' on node 'node1'
+info 120 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 140 node1/crm: auto rebalance - migrate vm:104 to node2 (expected change for imbalance from 1.41 to 0.98)
+info 140 node1/crm: got crm command: migrate vm:104 node2
+info 140 node1/crm: migrate service 'vm:104' to node 'node2'
+info 140 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 141 node1/lrm: service vm:104 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:104 - end migrate to node 'node2'
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 160 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 163 node2/lrm: starting service vm:104
+info 163 node2/lrm: service status vm:104 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
new file mode 100644
index 00000000..00f615e9
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-stays-on-node1
+ nodes node1
+ resources vm:101,vm:102,vm:103
+ strict 1
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
new file mode 100644
index 00000000..57e3579d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..b11cc5eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/README b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
new file mode 100644
index 00000000..be072f6d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
@@ -0,0 +1,12 @@
+Test that the auto rebalance system with dynamic usage information will
+consider running HA resources in strict positive resource affinity rules as
+bundles, which can only be moved to other nodes as a single unit.
+
+Therefore, even though the two initial HA resources would be split apart,
+because these cause a node imbalance in the cluster, the auto rebalance system
+does not issue a rebalancing migration, because they must stay together.
+
+As a sanity check, adding another HA resource, which is not part of the strict
+positive resource affinity rule, will cause a rebalancing migration: in this
+case the resource bundle itself, because the leading node 'vm:101' is
+alphabetically first.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..61373367
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:103 add node1 started 1",
+ "service vm:103 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:103 set-dynamic-stats cpu 4.0 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..4f81dfe2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
new file mode 100644
index 00000000..26be9421
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
@@ -0,0 +1,53 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:103 add node1 started 1
+info 120 cmdlist: execute service vm:103 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:103 set-dynamic-stats cpu 4.0 mem 4096
+info 120 node1/crm: adding new service 'vm:103' on node 'node1'
+info 120 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 140 node1/crm: auto rebalance - migrate vm:101 to node2 (expected change for imbalance from 1.41 to 0.86)
+info 140 node1/crm: got crm command: migrate vm:101 node2
+info 140 node1/crm: crm command 'migrate vm:101 node2' - migrate service 'vm:102' to node 'node2' (service 'vm:102' in positive affinity with service 'vm:101')
+info 140 node1/crm: migrate service 'vm:101' to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 140 node1/crm: migrate service 'vm:102' to node 'node2'
+info 140 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 141 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 141 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 160 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 163 node2/lrm: starting service vm:101
+info 163 node2/lrm: service status vm:101 started
+info 163 node2/lrm: starting service vm:102
+info 163 node2/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
new file mode 100644
index 00000000..e1948a00
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+ resources vm:101,vm:102
+ affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
new file mode 100644
index 00000000..880e0a59
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..455ae043
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/README b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
new file mode 100644
index 00000000..4b4d4855
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will wait on
+a resource motion being finished, because a strict positive resource affinity
+rule is not correctly enforced yet.
+
+This test case manipulates the manager status in such a way, so that the HA
+Manager will assume that the not-yet-migrated HA resource in the strict
+positive resource affinity rule is still migrating as currently the integration
+tests do not support prolonged migrations.
+
+Furthermore, auto rebalancing migrations are forced to be issued as soon as
+possible with the hold duration being set to 0. This ensures that if the auto
+rebalance system would not wait on the ongoing migration, the auto rebalancing
+migration would be done right away in the same round as the HA resources being
+acknowledged as running.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..181ea848
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-hold-duration": 0
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..d35a2c8f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+ "vm:103": { "cpu": 4.7, "mem": 5242880000 },
+ "vm:104": { "cpu": 4.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
new file mode 100644
index 00000000..35282c7d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: service vm:101 - start migrate to node 'node1'
+info 23 node2/lrm: service vm:101 - end migrate to node 'node1'
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 41 node1/lrm: starting service vm:101
+info 41 node1/lrm: service status vm:101 started
+info 60 node1/crm: auto rebalance - migrate vm:102 to node2 (expected change for imbalance from 1.41 to 0.72)
+info 60 node1/crm: got crm command: migrate vm:102 node2
+info 60 node1/crm: migrate service 'vm:102' to node 'node2'
+info 60 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 61 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 61 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 80 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 83 node2/lrm: starting service vm:102
+info 83 node2/lrm: service status vm:102 started
+info 100 node1/crm: auto rebalance - migrate vm:101 to node3 (expected change for imbalance from 0.72 to 0.27)
+info 100 node1/crm: got crm command: migrate vm:101 node3
+info 100 node1/crm: crm command 'migrate vm:101 node3' - migrate service 'vm:103' to node 'node3' (service 'vm:103' in positive affinity with service 'vm:101')
+info 100 node1/crm: migrate service 'vm:101' to node 'node3'
+info 100 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 100 node1/crm: migrate service 'vm:103' to node 'node3'
+info 100 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 101 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 101 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 101 node1/lrm: service vm:103 - start migrate to node 'node3'
+info 101 node1/lrm: service vm:103 - end migrate to node 'node3'
+info 105 node3/lrm: got lock 'ha_agent_node3_lock'
+info 105 node3/lrm: status change wait_for_agent_lock => active
+info 120 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 120 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 125 node3/lrm: starting service vm:101
+info 125 node3/lrm: service status vm:101 started
+info 125 node3/lrm: starting service vm:103
+info 125 node3/lrm: service status vm:103 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
new file mode 100644
index 00000000..cf90037c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
@@ -0,0 +1,31 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"online",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node2",
+ "state": "migrate",
+ "target": "node1",
+ "uid": "RoPGTlvNYq/oZFokv9fgWw"
+ },
+ "vm:102": {
+ "node": "node1",
+ "state": "started",
+ "uid": "fR3i18EHk6DhF8Zd2jddNX"
+ },
+ "vm:103": {
+ "node": "node1",
+ "state": "started",
+ "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+ },
+ "vm:104": {
+ "node": "node1",
+ "state": "started",
+ "uid": "23hk23EHk6DhF8Zd0218DD"
+ }
+ }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
new file mode 100644
index 00000000..2c3f3171
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+ resources vm:101,vm:103
+ affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
new file mode 100644
index 00000000..3dadaabc
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/README b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
new file mode 100644
index 00000000..e304cc22
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will not
+rebalance a HA resource on the same node as another HA resource, which are in a
+strict negative resource affinity rule.
+
+There is a high node imbalance since vm:101 and vm:102 on node1 cause a higher
+usage than node2 and node3 have. Even though it would be ideal to move one of
+these to node2, because it has a very low usage, these cannot be moved there as
+both vm:101 and vm:102 are in a strict negative resource affinity rule with a
+HA resource on node2 respectively.
+
+To minimize the imbalance in the cluster, one of the HA resources from node1 is
+migrated to node3 first, and afterwards the HA resource on node3, which is not
+in a strict negative resource affinity rule with a HA resource on node2, will
+be migrated to node2.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..083f338b
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 4294967296 },
+ "vm:102": { "cpu": 2.4, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.0, "mem": 0 },
+ "vm:104": { "cpu": 1.0, "mem": 1073741824 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
new file mode 100644
index 00000000..cd87f3a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:104
+info 25 node3/lrm: service status vm:104 started
+info 80 node1/crm: auto rebalance - migrate vm:101 to node3 (expected change for imbalance from 1.04 to 0.72)
+info 80 node1/crm: got crm command: migrate vm:101 node3
+info 80 node1/crm: migrate service 'vm:101' to node 'node3'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 105 node3/lrm: starting service vm:101
+info 105 node3/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:104 to node2 (expected change for imbalance from 0.72 to 0.33)
+info 160 node1/crm: got crm command: migrate vm:104 node2
+info 160 node1/crm: migrate service 'vm:104' to node 'node2'
+info 160 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:104 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:104 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:104
+info 183 node2/lrm: service status vm:104 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
new file mode 100644
index 00000000..eef5460f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
@@ -0,0 +1,7 @@
+resource-affinity: vms-stay-apart1
+ resources vm:101,vm:103
+ affinity negative
+
+resource-affinity: vms-stay-apart2
+ resources vm:102,vm:103
+ affinity negative
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
new file mode 100644
index 00000000..16bffacf
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (23 preceding siblings ...)
2026-04-02 12:44 ` [PATCH ha-manager v4 24/28] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:33 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
` (4 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
Suggested-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!
www/manager6/dc/OptionView.js | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index e80c6457..7136c914 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -195,8 +195,8 @@ Ext.define('PVE.dc.OptionView', {
value: '__default__',
comboItems: [
['__default__', Proxmox.Utils.defaultText + ' (basic)'],
- ['basic', 'Basic (Resource Count)'],
- ['static', 'Static Load'],
+ ['basic', gettext('Basic (Resource Count)')],
+ ['static', gettext('Static Load')],
],
defaultValue: '__default__',
},
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (24 preceding siblings ...)
2026-04-02 12:44 ` [PATCH manager v4 25/28] ui: dc/options: make the ha crs strings translatable Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:33 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
` (3 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new! (but nothing changed since v1)
www/manager6/dc/OptionView.js | 1 +
1 file changed, 1 insertion(+)
diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index 7136c914..fa87832b 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -197,6 +197,7 @@ Ext.define('PVE.dc.OptionView', {
['__default__', Proxmox.Utils.defaultText + ' (basic)'],
['basic', gettext('Basic (Resource Count)')],
['static', gettext('Static Load')],
+ ['dynamic', gettext('Dynamic Load')],
],
defaultValue: '__default__',
},
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (25 preceding siblings ...)
2026-04-02 12:44 ` [PATCH manager v4 26/28] ui: dc/options: add dynamic load scheduler option for ha crs Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:35 ` Dominik Rusovac
2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
` (2 subsequent siblings)
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
This is in preparation to the next patch, which adds a view model to the
component to make options dependent on each other's state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!
www/manager6/Makefile | 1 +
www/manager6/dc/OptionView.js | 38 +++++-----------------
www/manager6/form/CRSOptions.js | 56 +++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 31 deletions(-)
create mode 100644 www/manager6/form/CRSOptions.js
diff --git a/www/manager6/Makefile b/www/manager6/Makefile
index 4558d53e..b0c94b32 100644
--- a/www/manager6/Makefile
+++ b/www/manager6/Makefile
@@ -27,6 +27,7 @@ JSSRC= \
form/BridgeSelector.js \
form/BusTypeSelector.js \
form/CPUModelSelector.js \
+ form/CRSOptions.js \
form/CacheTypeSelector.js \
form/CalendarEvent.js \
form/CephPoolSelector.js \
diff --git a/www/manager6/dc/OptionView.js b/www/manager6/dc/OptionView.js
index fa87832b..7c8d3792 100644
--- a/www/manager6/dc/OptionView.js
+++ b/www/manager6/dc/OptionView.js
@@ -180,38 +180,14 @@ Ext.define('PVE.dc.OptionView', {
},
],
});
- me.add_inputpanel_row('crs', gettext('Cluster Resource Scheduling'), {
+ me.rows.crs = {
+ required: true,
renderer: PVE.Utils.render_as_property_string,
- width: 450,
- labelWidth: 120,
- url: '/api2/extjs/cluster/options',
- onlineHelp: 'ha_manager_crs',
- items: [
- {
- xtype: 'proxmoxKVComboBox',
- name: 'ha',
- fieldLabel: gettext('HA Scheduling'),
- deleteEmpty: false,
- value: '__default__',
- comboItems: [
- ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
- ['basic', gettext('Basic (Resource Count)')],
- ['static', gettext('Static Load')],
- ['dynamic', gettext('Dynamic Load')],
- ],
- defaultValue: '__default__',
- },
- {
- xtype: 'proxmoxcheckbox',
- name: 'ha-rebalance-on-start',
- fieldLabel: gettext('Rebalance on Start'),
- boxLabel: gettext(
- 'Use CRS to select the least loaded node when starting an HA service',
- ),
- value: 0,
- },
- ],
- });
+ header: gettext('Cluster Resource Scheduling'),
+ editor: {
+ xtype: 'pveCRSOptions',
+ },
+ };
me.add_inputpanel_row('u2f', gettext('U2F Settings'), {
renderer: (v) =>
!v ? Proxmox.Utils.NoneText : Ext.htmlEncode(PVE.Parser.printPropertyString(v)),
diff --git a/www/manager6/form/CRSOptions.js b/www/manager6/form/CRSOptions.js
new file mode 100644
index 00000000..b22c5c99
--- /dev/null
+++ b/www/manager6/form/CRSOptions.js
@@ -0,0 +1,56 @@
+Ext.define('PVE.form.CRSOptions', {
+ extend: 'Proxmox.window.Edit',
+ alias: 'widget.pveCRSOptions',
+
+ width: 450,
+ url: '/api2/extjs/cluster/options',
+ onlineHelp: 'ha_manager_crs',
+
+ fieldDefaults: {
+ labelWidth: 120,
+ },
+
+ setValues: function (values) {
+ Ext.Array.each(this.query('inputpanel'), (panel) => {
+ panel.setValues(values.crs);
+ });
+ },
+
+ items: [
+ {
+ xtype: 'inputpanel',
+ onGetValues: function (values) {
+ if (values === undefined || Object.keys(values).length === 0) {
+ return { delete: 'crs' };
+ } else {
+ return { crs: PVE.Parser.printPropertyString(values) };
+ }
+ },
+ items: [
+ {
+ xtype: 'proxmoxKVComboBox',
+ name: 'ha',
+ fieldLabel: gettext('HA Scheduling'),
+ deleteEmpty: false,
+ value: '__default__',
+ comboItems: [
+ ['__default__', Proxmox.Utils.defaultText + ' (basic)'],
+ ['basic', gettext('Basic (Resource Count)')],
+ ['static', gettext('Static Load')],
+ ['dynamic', gettext('Dynamic Load')],
+ ],
+ defaultValue: '__default__',
+ },
+ {
+ xtype: 'proxmoxcheckbox',
+ name: 'ha-rebalance-on-start',
+ fieldLabel: gettext('Rebalance on Start'),
+ boxLabel: gettext(
+ 'Use CRS to select the least loaded node when starting an HA service',
+ ),
+ value: 0,
+ },
+ ],
+ },
+ ],
+});
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (26 preceding siblings ...)
2026-04-02 12:44 ` [PATCH manager v4 27/28] ui: move cluster resource scheduling from dc/options into separate component Daniel Kral
@ 2026-04-02 12:44 ` Daniel Kral
2026-04-02 13:38 ` Dominik Rusovac
2026-04-02 14:24 ` [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Dominik Rusovac
2026-04-02 16:07 ` applied: " Thomas Lamprecht
29 siblings, 1 reply; 41+ messages in thread
From: Daniel Kral @ 2026-04-02 12:44 UTC (permalink / raw)
To: pve-devel
ha-auto-rebalance-{method,margin,hold-duration,margin} require
ha-auto-rebalance to be enabled in the schema, therefore they are
disabled here unless ha-auto-rebalance is enabled.
The label width was enlared a bit, so that the longer labels for the
auto rebalancing options are more readable.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v3 -> v4:
- new!
- only changes from v1 are that a separate component is used now, it
uses a viewModel to disable fields that shouldn't be set, and widen
the label width a bit;
- also 'Margin' is 'Minimum Imbalance Improvement' in the UI
www/manager6/form/CRSOptions.js | 62 ++++++++++++++++++++++++++++++++-
1 file changed, 61 insertions(+), 1 deletion(-)
diff --git a/www/manager6/form/CRSOptions.js b/www/manager6/form/CRSOptions.js
index b22c5c99..b5476bd5 100644
--- a/www/manager6/form/CRSOptions.js
+++ b/www/manager6/form/CRSOptions.js
@@ -7,7 +7,7 @@ Ext.define('PVE.form.CRSOptions', {
onlineHelp: 'ha_manager_crs',
fieldDefaults: {
- labelWidth: 120,
+ labelWidth: 150,
},
setValues: function (values) {
@@ -16,6 +16,8 @@ Ext.define('PVE.form.CRSOptions', {
});
},
+ viewModel: {},
+
items: [
{
xtype: 'inputpanel',
@@ -50,6 +52,64 @@ Ext.define('PVE.form.CRSOptions', {
),
value: 0,
},
+ {
+ xtype: 'proxmoxcheckbox',
+ name: 'ha-auto-rebalance',
+ fieldLabel: gettext('Automatic Rebalance'),
+ boxLabel: gettext('Automatically rebalance HA resources'),
+ value: 0,
+ reference: 'enableAutoRebalance',
+ },
+ {
+ xtype: 'numberfield',
+ name: 'ha-auto-rebalance-threshold',
+ fieldLabel: gettext('Imbalance Threshold'),
+ emptyText: '0.3',
+ minValue: 0.0,
+ step: 0.01,
+ bind: {
+ disabled: '{!enableAutoRebalance.checked}',
+ },
+ },
+ {
+ xtype: 'proxmoxKVComboBox',
+ name: 'ha-auto-rebalance-method',
+ fieldLabel: gettext('Rebalancing Method'),
+ deleteEmpty: false,
+ value: '__default__',
+ comboItems: [
+ ['__default__', Proxmox.Utils.defaultText + ' (bruteforce)'],
+ ['bruteforce', 'Bruteforce'],
+ ['topsis', 'TOPSIS'],
+ ],
+ defaultValue: '__default__',
+ bind: {
+ disabled: '{!enableAutoRebalance.checked}',
+ },
+ },
+ {
+ xtype: 'numberfield',
+ name: 'ha-auto-rebalance-hold-duration',
+ fieldLabel: gettext('Hold Duration'),
+ emptyText: '3',
+ minValue: 0,
+ step: 1,
+ bind: {
+ disabled: '{!enableAutoRebalance.checked}',
+ },
+ },
+ {
+ xtype: 'numberfield',
+ name: 'ha-auto-rebalance-margin',
+ fieldLabel: gettext('Minimum Imbalance Improvement'),
+ emptyText: '0.1',
+ minValue: 0.0,
+ maxValue: 1.0,
+ step: 0.01,
+ bind: {
+ disabled: '{!enableAutoRebalance.checked}',
+ },
+ },
],
},
],
--
2.47.3
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options
2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
@ 2026-04-02 13:38 ` Dominik Rusovac
0 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 13:38 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider this
On Thu Apr 2, 2026 at 2:44 PM CEST, Daniel Kral wrote:
> ha-auto-rebalance-{method,margin,hold-duration,margin} require
> ha-auto-rebalance to be enabled in the schema, therefore they are
> disabled here unless ha-auto-rebalance is enabled.
good measure
>
> The label width was enlared a bit, so that the longer labels for the
> auto rebalancing options are more readable.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v3 -> v4:
> - new!
> - only changes from v1 are that a separate component is used now, it
> uses a viewModel to disable fields that shouldn't be set, and widen
> the label width a bit;
> - also 'Margin' is 'Minimum Imbalance Improvement' in the UI
+1, is more comprehensible imo
>
> www/manager6/form/CRSOptions.js | 62 ++++++++++++++++++++++++++++++++-
> 1 file changed, 61 insertions(+), 1 deletion(-)
>
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (27 preceding siblings ...)
2026-04-02 12:44 ` [PATCH manager v4 28/28] ui: form: add crs auto rebalancing options Daniel Kral
@ 2026-04-02 14:24 ` Dominik Rusovac
2026-04-02 16:07 ` applied: " Thomas Lamprecht
29 siblings, 0 replies; 41+ messages in thread
From: Dominik Rusovac @ 2026-04-02 14:24 UTC (permalink / raw)
To: Daniel Kral, pve-devel
Reviewed all of the patches from v1 up to v4. Tested the behavior
of the CRS in a 3-node-cluster and in a 7-node-cluster regarding:
* disarmed HA and re-armed HA
* maintenance mode of nodes
* fenced nodes
* affinity rules
Moreover:
* used all of the variations (static or dynamic with bruteforce or
topsis)
* played around with a bunch of different thresholds, margins and hold
durations for the purpose of fine tuning the scheduler
* verified that hostnames can include hyphens, for example
* verified that minimum requirements for number fields are detected
* used UI for setting different auto-rebalance parameters
Observations:
* scoring of best migration cannot happen in the same round as enabling
maintenance mode, obtained warning:
"unable to score best balancing migration - leader 'ct:205' is not present in the cluster usage"
* sustained imbalance round counter is not reset in case of early
returns, which, e.g., can cause auto rebalance immediately after
re-arming or disabling maintenance mode
Looks good to me overall, I think the tiny things related to my
observations could be fixed in a small follow-up.
On Thu Apr 2, 2026 at 2:43 PM CEST, Daniel Kral wrote:
> Here's the v4 of the load balancer patches for the HA Manager.
>
> Most of the patches here are already R-b'd by @Dominik (many, many
> thanks!) and only a few things have changed, the biggest of course is
> changing the default node imbalance threshold from '0.7' to '0.3' and
> adding the pve-manager patches.
>
> I'm already half-way there with the pve-docs patches, but will send them
> in a separate patch series (as the changes are also updating the CRS
> section in general).
>
> Thank you very much for the feedback @Dominik, @Thomas, @Maximiliano,
> and @Jillian Morgan!
>
[snip]
Consider this as:
Tested-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 41+ messages in thread* applied: [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer
2026-04-02 12:43 [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Daniel Kral
` (28 preceding siblings ...)
2026-04-02 14:24 ` [PATCH cluster/ha-manager/manager v4 00/28] dynamic scheduler + load rebalancer Dominik Rusovac
@ 2026-04-02 16:07 ` Thomas Lamprecht
29 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2026-04-02 16:07 UTC (permalink / raw)
To: pve-devel, Daniel Kral
On Thu, 02 Apr 2026 14:43:54 +0200, Daniel Kral wrote:
> Here's the v4 of the load balancer patches for the HA Manager.
>
> Most of the patches here are already R-b'd by @Dominik (many, many
> thanks!) and only a few things have changed, the biggest of course is
> changing the default node imbalance threshold from '0.7' to '0.3' and
> adding the pve-manager patches.
>
> [...]
Applied, thanks to all involved, nice work!
cluster:
[1/3] datacenter config: restructure verbose description for the ha crs option
commit: 79cd0872a4dafa7bd480e2d70aca3757afd25e61
[2/3] datacenter config: add dynamic load scheduler option
commit: 871f0973bb6828247aa7ef2b72cca6565d84306d
[3/3] datacenter config: add auto rebalancing options
commit: f3f8347a1b2c343929e010bb7c9929098a226168
ha-manager:
[01/21] env: pve2: implement dynamic node and service stats
commit: addedabda082cd2fbc43ce114d3f62d1dab43c6e
[02/21] sim: hardware: pass correct types for static stats
commit: 6316e7e38ff2334fa63733622db9a96c834e1a05
[03/21] sim: hardware: factor out static stats' default values
commit: 2679bfb5eca9dc978b1d03a8ef094a090b08f4b8
[04/21] sim: hardware: fix static stats guard
commit: a9d1210db90eca0c06959c1df90d8035d3e01937
[05/21] sim: hardware: handle dynamic service stats
commit: 55375917fdd90d8d4457d686161e937cffc7e330
[06/21] sim: hardware: add set-dynamic-stats command
commit: c334bb4df905879c88ea75f5a5be43c3dd98bcec
[07/21] sim: hardware: add getters for dynamic {node,service} stats
commit: 75235b476937b2bf85f136b257d193d48231164d
[08/21] usage: pass service data to add_service_usage
commit: 7af6ee02a9f3c31adc47c6a0c5531eff9545dea3
[09/21] usage: pass service data to get_used_service_nodes
commit: 1ffe83333bb11981f9d4642a9d82a2c28c649f73
[10/21] add running flag to non-HA cluster service stats
commit: 54789d6b162d30e093522f6adbc224168c427877
[11/21] usage: use add_service to add service usage to nodes
commit: 9780600e3539f0851873f45cec6ac33ce7220212
[12/21] usage: add dynamic usage scheduler
commit: 6684f186212cf66ea54c7f6115778eb779ed3322
[13/21] test: add dynamic usage scheduler test cases
commit: c377eacd2022250bd6b229fed1b33c4f9b1c456e
[14/21] manager: rename execute_migration to queue_resource_motion
commit: 4c87446560bcea0bd9ed2d05c1b5ff3c561e093d
[15/21] manager: update_crs_scheduler_mode: factor out crs config
commit: 55cfbf0ac35448aa246ede4a07e12f95a09ada4e
[16/21] implement automatic rebalancing
commit: f0f21bc1c547e578cfb520d134d38530c759119c
[17/21] test: add resource bundle generation test cases
commit: 8c0f2312561a59cbafc5c910b36684a6c82eedc3
[18/21] test: add dynamic automatic rebalancing system test cases
commit: 36813aca2f1a1d0e1fcb839f611b8ea2f5039f26
[19/21] test: add static automatic rebalancing system test cases
commit: ada49c44a2e49bcb01c24c2828dd31ab0e27fff6
[20/21] test: add automatic rebalancing system test cases with TOPSIS method
commit: 1419ec503b5b9eaf4bfabf7a38ed3ee80b101234
[21/21] test: add automatic rebalancing system test cases with affinity rules
commit: 6699b3e5a192c50a380391a32bdd613340f16751
manager:
[1/4] ui: dc/options: make the ha crs strings translatable
commit: c3b5bbe4779a1e63cebd190c833ffe5f032d6d5f
[2/4] ui: dc/options: add dynamic load scheduler option for ha crs
commit: c4f15682a44f7494f9e89cdbd09875c7c75c209a
[3/4] ui: move cluster resource scheduling from dc/options into separate component
commit: 8a032a0c1dfec068c7abc1b40327e03f6568e731
[4/4] ui: form: add crs auto rebalancing options
commit: 4557bdf8155b8fc5e3e17b18e980ddda9e78b1d4
^ permalink raw reply [flat|nested] 41+ messages in thread