* [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:10 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
` (38 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
This makes moving the function out into its own module easier to follow,
which in turn is needed to generalize score_nodes_to_start_service(...)
for other usage stats in the following patches.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
proxmox-resource-scheduling/src/pve_static.rs | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index b81086dd..fd5e5ffc 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -94,7 +94,11 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
for (index, node) in nodes.iter().enumerate() {
let node = node.as_ref();
let new_cpu = if index == target_index {
- add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
+ if service.maxcpu == 0.0 {
+ node.cpu + node.maxcpu as f64
+ } else {
+ node.cpu + service.maxcpu
+ }
} else {
node.cpu
} / (node.maxcpu as f64);
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-26 10:10 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:10 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This makes moving the function out into its own module easier to follow,
> which in turn is needed to generalize score_nodes_to_start_service(...)
> for other usage stats in the following patches.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> proxmox-resource-scheduling/src/pve_static.rs | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
> index b81086dd..fd5e5ffc 100644
> --- a/proxmox-resource-scheduling/src/pve_static.rs
> +++ b/proxmox-resource-scheduling/src/pve_static.rs
> @@ -94,7 +94,11 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
> for (index, node) in nodes.iter().enumerate() {
> let node = node.as_ref();
> let new_cpu = if index == target_index {
> - add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
> + if service.maxcpu == 0.0 {
> + node.cpu + node.maxcpu as f64
> + } else {
> + node.cpu + service.maxcpu
> + }
> } else {
> node.cpu
> } / (node.maxcpu as f64);
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:11 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
` (37 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
This is done so score_nodes_to_start_service(...) can be generalized in
the following patches, so other usage stat structs can reuse the same
scoring method.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add patch message
- do not change visibility of pve_static::add_cpu_usage() (therefore the
inlining of the code in the patch before this)
proxmox-resource-scheduling/src/lib.rs | 2 +
proxmox-resource-scheduling/src/pve_static.rs | 76 +---------------
proxmox-resource-scheduling/src/scheduler.rs | 90 +++++++++++++++++++
3 files changed, 94 insertions(+), 74 deletions(-)
create mode 100644 proxmox-resource-scheduling/src/scheduler.rs
diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 47980259..c73e7b1e 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,4 +1,6 @@
#[macro_use]
pub mod topsis;
+pub mod scheduler;
+
pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index fd5e5ffc..5df0be37 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,7 +1,7 @@
use anyhow::Error;
use serde::{Deserialize, Serialize};
-use crate::topsis;
+use crate::scheduler;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -53,23 +53,6 @@ pub struct StaticServiceUsage {
pub maxmem: usize,
}
-criteria_struct! {
- /// A given alternative.
- struct PveTopsisAlternative {
- #[criterion("average CPU", -1.0)]
- average_cpu: f64,
- #[criterion("highest CPU", -2.0)]
- highest_cpu: f64,
- #[criterion("average memory", -5.0)]
- average_memory: f64,
- #[criterion("highest memory", -10.0)]
- highest_memory: f64,
- }
-
- const N_CRITERIA;
- static PVE_HA_TOPSIS_CRITERIA;
-}
-
/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
/// and CPU usages of the nodes as if the service would already be running on each.
///
@@ -79,60 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
nodes: &[T],
service: &StaticServiceUsage,
) -> Result<Vec<(String, f64)>, Error> {
- let len = nodes.len();
-
- let matrix = nodes
- .iter()
- .enumerate()
- .map(|(target_index, _)| {
- // Base values on percentages to allow comparing nodes with different stats.
- let mut highest_cpu = 0.0;
- let mut squares_cpu = 0.0;
- let mut highest_mem = 0.0;
- let mut squares_mem = 0.0;
-
- for (index, node) in nodes.iter().enumerate() {
- let node = node.as_ref();
- let new_cpu = if index == target_index {
- if service.maxcpu == 0.0 {
- node.cpu + node.maxcpu as f64
- } else {
- node.cpu + service.maxcpu
- }
- } else {
- node.cpu
- } / (node.maxcpu as f64);
- highest_cpu = f64::max(highest_cpu, new_cpu);
- squares_cpu += new_cpu.powi(2);
-
- let new_mem = if index == target_index {
- node.mem + service.maxmem
- } else {
- node.mem
- } as f64
- / node.maxmem as f64;
- highest_mem = f64::max(highest_mem, new_mem);
- squares_mem += new_mem.powi(2);
- }
-
- // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
- // 1.004 is only slightly more than 1.002.
- PveTopsisAlternative {
- average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
- highest_cpu: 1.0 + highest_cpu,
- average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
- highest_memory: 1.0 + highest_mem,
- }
- .into()
- })
- .collect::<Vec<_>>();
-
- let scores =
- topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
- Ok(scores
- .into_iter()
- .enumerate()
- .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
- .collect())
+ scheduler::score_nodes_to_start_service(nodes, service)
}
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
new file mode 100644
index 00000000..385015e3
--- /dev/null
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -0,0 +1,90 @@
+use anyhow::Error;
+
+use crate::{
+ pve_static::{StaticNodeUsage, StaticServiceUsage},
+ topsis,
+};
+
+criteria_struct! {
+ /// A given alternative.
+ struct PveTopsisAlternative {
+ #[criterion("average CPU", -1.0)]
+ average_cpu: f64,
+ #[criterion("highest CPU", -2.0)]
+ highest_cpu: f64,
+ #[criterion("average memory", -5.0)]
+ average_memory: f64,
+ #[criterion("highest memory", -10.0)]
+ highest_memory: f64,
+ }
+
+ const N_CRITERIA;
+ static PVE_HA_TOPSIS_CRITERIA;
+}
+
+/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the service would already be running on each.
+///
+/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
+/// is better.
+pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+ nodes: &[T],
+ service: &StaticServiceUsage,
+) -> Result<Vec<(String, f64)>, Error> {
+ let len = nodes.len();
+
+ let matrix = nodes
+ .iter()
+ .enumerate()
+ .map(|(target_index, _)| {
+ // Base values on percentages to allow comparing nodes with different stats.
+ let mut highest_cpu = 0.0;
+ let mut squares_cpu = 0.0;
+ let mut highest_mem = 0.0;
+ let mut squares_mem = 0.0;
+
+ for (index, node) in nodes.iter().enumerate() {
+ let node = node.as_ref();
+ let new_cpu = if index == target_index {
+ if service.maxcpu == 0.0 {
+ node.cpu + node.maxcpu as f64
+ } else {
+ node.cpu + service.maxcpu
+ }
+ } else {
+ node.cpu
+ } / (node.maxcpu as f64);
+ highest_cpu = f64::max(highest_cpu, new_cpu);
+ squares_cpu += new_cpu.powi(2);
+
+ let new_mem = if index == target_index {
+ node.mem + service.maxmem
+ } else {
+ node.mem
+ } as f64
+ / node.maxmem as f64;
+ highest_mem = f64::max(highest_mem, new_mem);
+ squares_mem += new_mem.powi(2);
+ }
+
+ // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+ // 1.004 is only slightly more than 1.002.
+ PveTopsisAlternative {
+ average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+ highest_cpu: 1.0 + highest_cpu,
+ average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+ highest_memory: 1.0 + highest_mem,
+ }
+ .into()
+ })
+ .collect::<Vec<_>>();
+
+ let scores =
+ topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+ Ok(scores
+ .into_iter()
+ .enumerate()
+ .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
+ .collect())
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:12 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
` (36 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The term `resource` is more appropriate with respect to the crate name
and also the preferred name for the current main application in the HA
Manager.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
proxmox-resource-scheduling/src/pve_static.rs | 2 +-
proxmox-resource-scheduling/src/scheduler.rs | 14 +++++++-------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index 5df0be37..c7e1d1b1 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -62,5 +62,5 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
nodes: &[T],
service: &StaticServiceUsage,
) -> Result<Vec<(String, f64)>, Error> {
- scheduler::score_nodes_to_start_service(nodes, service)
+ scheduler::score_nodes_to_start_resource(nodes, service)
}
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 385015e3..39ee44ce 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -22,14 +22,14 @@ criteria_struct! {
static PVE_HA_TOPSIS_CRITERIA;
}
-/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the service would already be running on each.
+/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the resource would already be running on each.
///
/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
/// is better.
-pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
+pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
nodes: &[T],
- service: &StaticServiceUsage,
+ resource: &StaticServiceUsage,
) -> Result<Vec<(String, f64)>, Error> {
let len = nodes.len();
@@ -46,10 +46,10 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
for (index, node) in nodes.iter().enumerate() {
let node = node.as_ref();
let new_cpu = if index == target_index {
- if service.maxcpu == 0.0 {
+ if resource.maxcpu == 0.0 {
node.cpu + node.maxcpu as f64
} else {
- node.cpu + service.maxcpu
+ node.cpu + resource.maxcpu
}
} else {
node.cpu
@@ -58,7 +58,7 @@ pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
squares_cpu += new_cpu.powi(2);
let new_mem = if index == target_index {
- node.mem + service.maxmem
+ node.mem + resource.maxmem
} else {
node.mem
} as f64
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (2 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:19 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
` (35 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The existing score_nodes_to_start_resource(...) function is dependent on
the StaticNodeUsage and StaticServiceUsage structs.
To use this function for other usage stats structs as well, declare
generic NodeStats and ResourceStats structs, that the users can convert
into. These are used to make score_nodes_to_start_resource(...) and its
documentation generic.
The pve_static::score_nodes_to_start_service(...) is marked as
deprecated accordingly. The usage-related structs are marked as
deprecated as well as the specific usage implementations - including
their serialization and deserialization - should be handled by the
caller now.
This is best viewed with the git option --ignore-all-space.
No functional changes intended.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
This patch was "[RFC proxmox 2/5] resource-scheduling: introduce generic
cluster usage implementation" in the RFC v1. Sorry for the ambigious
naming with the next patch!
changes v1 -> v2:
- add more information to the patch message
- split out `NodeStats` and `ResourceStats` to their own modules, which
will also be used in an upcoming patch for the `Usage` implementation
- add deprecation note to pve_static::score_nodes_to_start_service()
- add deprecation attribute to other major pve_static items as well
- impl `Add` and `Sum` for ResourceStats, which will be used for the
resource bundling in pve-rs later
- eagerly implement common traits (especially Clone and Debug)
- add test cases for the
scheduler::Scheduler::score_nodes_to_start_resource()
proxmox-resource-scheduling/src/lib.rs | 6 +
proxmox-resource-scheduling/src/node.rs | 39 ++++
proxmox-resource-scheduling/src/pve_static.rs | 46 +++-
proxmox-resource-scheduling/src/resource.rs | 33 +++
proxmox-resource-scheduling/src/scheduler.rs | 157 ++++++++------
.../tests/scheduler.rs | 200 ++++++++++++++++++
6 files changed, 408 insertions(+), 73 deletions(-)
create mode 100644 proxmox-resource-scheduling/src/node.rs
create mode 100644 proxmox-resource-scheduling/src/resource.rs
create mode 100644 proxmox-resource-scheduling/tests/scheduler.rs
diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index c73e7b1e..12b743fe 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -1,6 +1,12 @@
#[macro_use]
pub mod topsis;
+pub mod node;
+pub mod resource;
+
pub mod scheduler;
+// pve_static exists only for backwards compatibility to not break builds
+// The allow(deprecated) is to not report its own use of deprecated items
+#[allow(deprecated)]
pub mod pve_static;
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
new file mode 100644
index 00000000..e6227eda
--- /dev/null
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -0,0 +1,39 @@
+use crate::resource::ResourceStats;
+
+/// Usage statistics of a node.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct NodeStats {
+ /// CPU utilization in CPU cores.
+ pub cpu: f64,
+ /// Total number of CPU cores.
+ pub maxcpu: usize,
+ /// Used memory in bytes.
+ pub mem: usize,
+ /// Total memory in bytes.
+ pub maxmem: usize,
+}
+
+impl NodeStats {
+ /// Adds the resource stats to the node stats as if the resource has started on the node.
+ pub fn add_started_resource(&mut self, resource_stats: &ResourceStats) {
+ // a maxcpu value of `0.0` means no cpu usage limit on the node
+ let resource_cpu = if resource_stats.maxcpu == 0.0 {
+ self.maxcpu as f64
+ } else {
+ resource_stats.maxcpu
+ };
+
+ self.cpu += resource_cpu;
+ self.mem += resource_stats.maxmem;
+ }
+
+ /// Returns the current cpu usage as a percentage.
+ pub fn cpu_load(&self) -> f64 {
+ self.cpu / self.maxcpu as f64
+ }
+
+ /// Returns the current memory usage as a percentage.
+ pub fn mem_load(&self) -> f64 {
+ self.mem as f64 / self.maxmem as f64
+ }
+}
diff --git a/proxmox-resource-scheduling/src/pve_static.rs b/proxmox-resource-scheduling/src/pve_static.rs
index c7e1d1b1..229ee3c6 100644
--- a/proxmox-resource-scheduling/src/pve_static.rs
+++ b/proxmox-resource-scheduling/src/pve_static.rs
@@ -1,10 +1,12 @@
use anyhow::Error;
use serde::{Deserialize, Serialize};
-use crate::scheduler;
+use crate::scheduler::{NodeUsage, Scheduler};
+use crate::{node::NodeStats, resource::ResourceStats};
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
+#[deprecated = "specific node usage structs should be declared where they are used"]
/// Static usage information of a node.
pub struct StaticNodeUsage {
/// Hostname of the node.
@@ -33,6 +35,22 @@ impl AsRef<StaticNodeUsage> for StaticNodeUsage {
}
}
+impl From<StaticNodeUsage> for NodeUsage {
+ fn from(usage: StaticNodeUsage) -> Self {
+ let stats = NodeStats {
+ cpu: usage.cpu,
+ maxcpu: usage.maxcpu,
+ mem: usage.mem,
+ maxmem: usage.maxmem,
+ };
+
+ Self {
+ name: usage.name,
+ stats,
+ }
+ }
+}
+
/// Calculate new CPU usage in percent.
/// `add` being `0.0` means "unlimited" and results in `max` being added.
fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
@@ -43,8 +61,9 @@ fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
}
}
-#[derive(Serialize, Deserialize)]
+#[derive(Clone, Copy, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
+#[deprecated = "specific service usage structs should be declared where they are used"]
/// Static usage information of an HA resource.
pub struct StaticServiceUsage {
/// Number of assigned CPUs or CPU limit.
@@ -53,14 +72,33 @@ pub struct StaticServiceUsage {
pub maxmem: usize,
}
+impl From<StaticServiceUsage> for ResourceStats {
+ fn from(usage: StaticServiceUsage) -> Self {
+ Self {
+ cpu: usage.maxcpu,
+ maxcpu: usage.maxcpu,
+ mem: usage.maxmem,
+ maxmem: usage.maxmem,
+ }
+ }
+}
+
/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
/// and CPU usages of the nodes as if the service would already be running on each.
///
/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
/// is better.
+#[deprecated = "use Scheduler::score_nodes_to_start_resource(...) directly instead"]
pub fn score_nodes_to_start_service<T: AsRef<StaticNodeUsage>>(
nodes: &[T],
service: &StaticServiceUsage,
) -> Result<Vec<(String, f64)>, Error> {
- scheduler::score_nodes_to_start_resource(nodes, service)
+ let nodes = nodes
+ .iter()
+ .map(|node| node.as_ref().clone().into())
+ .collect::<Vec<NodeUsage>>();
+
+ let scheduler = Scheduler::from_nodes(nodes);
+
+ scheduler.score_nodes_to_start_resource(*service)
}
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
new file mode 100644
index 00000000..1eb9d15e
--- /dev/null
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -0,0 +1,33 @@
+use std::{iter::Sum, ops::Add};
+
+/// Usage statistics for a resource.
+#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
+pub struct ResourceStats {
+ /// CPU utilization in CPU cores.
+ pub cpu: f64,
+ /// Number of assigned CPUs or CPU limit.
+ pub maxcpu: f64,
+ /// Used memory in bytes.
+ pub mem: usize,
+ /// Maximum assigned memory in bytes.
+ pub maxmem: usize,
+}
+
+impl Add for ResourceStats {
+ type Output = Self;
+
+ fn add(self, other: Self) -> Self {
+ Self {
+ cpu: self.cpu + other.cpu,
+ maxcpu: self.maxcpu + other.maxcpu,
+ mem: self.mem + other.mem,
+ maxmem: self.maxmem + other.maxmem,
+ }
+ }
+}
+
+impl Sum for ResourceStats {
+ fn sum<I: Iterator<Item = Self>>(iter: I) -> Self {
+ iter.fold(Self::default(), |a, b| a + b)
+ }
+}
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 39ee44ce..bb38f238 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -1,9 +1,15 @@
use anyhow::Error;
-use crate::{
- pve_static::{StaticNodeUsage, StaticServiceUsage},
- topsis,
-};
+use crate::{node::NodeStats, resource::ResourceStats, topsis};
+
+/// The scheduler view of a node.
+#[derive(Clone, Debug)]
+pub struct NodeUsage {
+ /// The identifier of the node.
+ pub name: String,
+ /// The usage statistics of the node.
+ pub stats: NodeStats,
+}
criteria_struct! {
/// A given alternative.
@@ -22,69 +28,82 @@ criteria_struct! {
static PVE_HA_TOPSIS_CRITERIA;
}
-/// Scores candidate `nodes` to start a `resource` on. Scoring is done according to the static memory
-/// and CPU usages of the nodes as if the resource would already be running on each.
-///
-/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
-/// is better.
-pub fn score_nodes_to_start_resource<T: AsRef<StaticNodeUsage>>(
- nodes: &[T],
- resource: &StaticServiceUsage,
-) -> Result<Vec<(String, f64)>, Error> {
- let len = nodes.len();
-
- let matrix = nodes
- .iter()
- .enumerate()
- .map(|(target_index, _)| {
- // Base values on percentages to allow comparing nodes with different stats.
- let mut highest_cpu = 0.0;
- let mut squares_cpu = 0.0;
- let mut highest_mem = 0.0;
- let mut squares_mem = 0.0;
-
- for (index, node) in nodes.iter().enumerate() {
- let node = node.as_ref();
- let new_cpu = if index == target_index {
- if resource.maxcpu == 0.0 {
- node.cpu + node.maxcpu as f64
- } else {
- node.cpu + resource.maxcpu
- }
- } else {
- node.cpu
- } / (node.maxcpu as f64);
- highest_cpu = f64::max(highest_cpu, new_cpu);
- squares_cpu += new_cpu.powi(2);
-
- let new_mem = if index == target_index {
- node.mem + resource.maxmem
- } else {
- node.mem
- } as f64
- / node.maxmem as f64;
- highest_mem = f64::max(highest_mem, new_mem);
- squares_mem += new_mem.powi(2);
- }
-
- // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
- // 1.004 is only slightly more than 1.002.
- PveTopsisAlternative {
- average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
- highest_cpu: 1.0 + highest_cpu,
- average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
- highest_memory: 1.0 + highest_mem,
- }
- .into()
- })
- .collect::<Vec<_>>();
-
- let scores =
- topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
-
- Ok(scores
- .into_iter()
- .enumerate()
- .map(|(n, score)| (nodes[n].as_ref().name.clone(), score))
- .collect())
+pub struct Scheduler {
+ nodes: Vec<NodeUsage>,
+}
+
+impl Scheduler {
+ /// Instantiate scheduler instance from node usages.
+ pub fn from_nodes<I>(nodes: I) -> Self
+ where
+ I: IntoIterator<Item: Into<NodeUsage>>,
+ {
+ Self {
+ nodes: nodes.into_iter().map(|node| node.into()).collect(),
+ }
+ }
+
+ /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
+ ///
+ /// The scoring is done as if the resource is already started on each node. This assumes that
+ /// the already started resource consumes the maximum amount of each stat according to its
+ /// `resource_stats`.
+ ///
+ /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
+ /// score is better.
+ pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
+ &self,
+ resource_stats: T,
+ ) -> Result<Vec<(String, f64)>, Error> {
+ let len = self.nodes.len();
+ let resource_stats = resource_stats.into();
+
+ let matrix = self
+ .nodes
+ .iter()
+ .enumerate()
+ .map(|(target_index, _)| {
+ // Base values on percentages to allow comparing nodes with different stats.
+ let mut highest_cpu = 0.0;
+ let mut squares_cpu = 0.0;
+ let mut highest_mem = 0.0;
+ let mut squares_mem = 0.0;
+
+ for (index, node) in self.nodes.iter().enumerate() {
+ let mut new_stats = node.stats;
+
+ if index == target_index {
+ new_stats.add_started_resource(&resource_stats)
+ };
+
+ let new_cpu = new_stats.cpu_load();
+ highest_cpu = f64::max(highest_cpu, new_cpu);
+ squares_cpu += new_cpu.powi(2);
+
+ let new_mem = new_stats.mem_load();
+ highest_mem = f64::max(highest_mem, new_mem);
+ squares_mem += new_mem.powi(2);
+ }
+
+ // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+ // 1.004 is only slightly more than 1.002.
+ PveTopsisAlternative {
+ average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+ highest_cpu: 1.0 + highest_cpu,
+ average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+ highest_memory: 1.0 + highest_mem,
+ }
+ .into()
+ })
+ .collect::<Vec<_>>();
+
+ let scores =
+ topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+ Ok(scores
+ .into_iter()
+ .enumerate()
+ .map(|(n, score)| (self.nodes[n].name.to_string(), score))
+ .collect())
+ }
}
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
new file mode 100644
index 00000000..c7a9dab9
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -0,0 +1,200 @@
+use anyhow::Error;
+use proxmox_resource_scheduling::{
+ node::NodeStats,
+ resource::ResourceStats,
+ scheduler::{NodeUsage, Scheduler},
+};
+
+fn new_homogeneous_cluster_scheduler() -> Scheduler {
+ let (maxcpu, maxmem) = (16, 64 * (1 << 30));
+
+ let node1 = NodeUsage {
+ name: String::from("node1"),
+ stats: NodeStats {
+ cpu: 1.7,
+ maxcpu,
+ mem: 12334 << 20,
+ maxmem,
+ },
+ };
+
+ let node2 = NodeUsage {
+ name: String::from("node2"),
+ stats: NodeStats {
+ cpu: 15.184,
+ maxcpu,
+ mem: 529 << 20,
+ maxmem,
+ },
+ };
+
+ let node3 = NodeUsage {
+ name: String::from("node3"),
+ stats: NodeStats {
+ cpu: 5.2,
+ maxcpu,
+ mem: 9381 << 20,
+ maxmem,
+ },
+ };
+
+ Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn new_heterogeneous_cluster_scheduler() -> Scheduler {
+ let node1 = NodeUsage {
+ name: String::from("node1"),
+ stats: NodeStats {
+ cpu: 1.7,
+ maxcpu: 16,
+ mem: 12334 << 20,
+ maxmem: 128 << 30,
+ },
+ };
+
+ let node2 = NodeUsage {
+ name: String::from("node2"),
+ stats: NodeStats {
+ cpu: 15.184,
+ maxcpu: 32,
+ mem: 529 << 20,
+ maxmem: 96 << 30,
+ },
+ };
+
+ let node3 = NodeUsage {
+ name: String::from("node3"),
+ stats: NodeStats {
+ cpu: 5.2,
+ maxcpu: 24,
+ mem: 9381 << 20,
+ maxmem: 64 << 30,
+ },
+ };
+
+ Scheduler::from_nodes(vec![node1, node2, node3])
+}
+
+fn rank_nodes_to_start_resource(
+ scheduler: &Scheduler,
+ resource_stats: ResourceStats,
+) -> Result<Vec<String>, Error> {
+ let mut alternatives = scheduler.score_nodes_to_start_resource(resource_stats)?;
+
+ alternatives.sort_by(|a, b| b.1.total_cmp(&a.1));
+
+ Ok(alternatives
+ .iter()
+ .map(|alternative| alternative.0.to_string())
+ .collect())
+}
+
+#[test]
+fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
+ let scheduler = new_homogeneous_cluster_scheduler();
+
+ let heavy_memory_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 1.0,
+ mem: 0,
+ maxmem: 12 << 30,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+ vec!["node2", "node3", "node1"]
+ );
+
+ let heavy_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 12.0,
+ mem: 0,
+ maxmem: 0,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+ vec!["node1", "node3", "node2"]
+ );
+
+ let unlimited_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 0.0,
+ mem: 0,
+ maxmem: 0,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+ vec!["node1", "node3", "node2"]
+ );
+
+ let unlimited_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 12.0,
+ mem: 0,
+ maxmem: 12 << 30,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+ vec!["node2", "node3", "node1"]
+ );
+
+ Ok(())
+}
+
+#[test]
+fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
+ let scheduler = new_heterogeneous_cluster_scheduler();
+
+ let heavy_memory_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 1.0,
+ mem: 0,
+ maxmem: 12 << 30,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
+ vec!["node2", "node1", "node3"]
+ );
+
+ let heavy_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 12.0,
+ mem: 0,
+ maxmem: 0,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
+ vec!["node3", "node2", "node1"]
+ );
+
+ let unlimited_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 0.0,
+ mem: 0,
+ maxmem: 0,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+ vec!["node1", "node3", "node2"]
+ );
+
+ let unlimited_cpu_resource_stats = ResourceStats {
+ cpu: 0.0,
+ maxcpu: 12.0,
+ mem: 0,
+ maxmem: 12 << 30,
+ };
+
+ assert_eq!(
+ rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
+ vec!["node2", "node1", "node3"]
+ );
+
+ Ok(())
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-26 10:19 ` Dominik Rusovac
2026-03-26 14:16 ` Daniel Kral
0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:19 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
pls find my comments inline, mostly nits.
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The existing score_nodes_to_start_resource(...) function is dependent on
> the StaticNodeUsage and StaticServiceUsage structs.
>
> To use this function for other usage stats structs as well, declare
> generic NodeStats and ResourceStats structs, that the users can convert
> into. These are used to make score_nodes_to_start_resource(...) and its
> documentation generic.
>
> The pve_static::score_nodes_to_start_service(...) is marked as
> deprecated accordingly. The usage-related structs are marked as
> deprecated as well as the specific usage implementations - including
> their serialization and deserialization - should be handled by the
> caller now.
>
> This is best viewed with the git option --ignore-all-space.
>
> No functional changes intended.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> This patch was "[RFC proxmox 2/5] resource-scheduling: introduce generic
> cluster usage implementation" in the RFC v1. Sorry for the ambigious
> naming with the next patch!
>
> changes v1 -> v2:
> - add more information to the patch message
> - split out `NodeStats` and `ResourceStats` to their own modules, which
> will also be used in an upcoming patch for the `Usage` implementation
> - add deprecation note to pve_static::score_nodes_to_start_service()
> - add deprecation attribute to other major pve_static items as well
> - impl `Add` and `Sum` for ResourceStats, which will be used for the
> resource bundling in pve-rs later
> - eagerly implement common traits (especially Clone and Debug)
> - add test cases for the
> scheduler::Scheduler::score_nodes_to_start_resource()
[snip]
good to have:
#[derive(Clone, Debug)]
> +pub struct Scheduler {
> + nodes: Vec<NodeUsage>,
> +}
> +
nit: The implementation of `Scheduler` is totally fine as-is. This is
just my two cents, as this was mentioned off-list.
I believe that for the implementation of the scheduler working with
enum variants and a trait and then exploiting static dispatch is more
convenient and easier to maintain, e.g.:
pub enum Schedulerr<Nodes> {
Topsis(Nodes),
BruteForce(Nodes),
}
pub trait Decide {
fn node_imbalance(&self) -> f64;
fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64;
fn score_best_balancing_migration_candidates(
&self,
candidates: &[MigrationCandidate],
limit: usize,
) -> Result<Vec<ScoredMigration>, Error>;
}
impl Decide for Schedulerr<Vec<NodeUsage>> {
fn node_imbalance(&self) -> f64 {
match self {
Self::Topsis(nodes) | Self::BruteForce(nodes) => {
calculate_node_imbalance(nodes, |node| node.stats.load())
}
}
}
fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64 {
match self {
Self::Topsis(nodes) | Self::BruteForce(nodes) => {
calculate_node_imbalance(nodes, |node| {
let mut new_stats = node.stats;
if node.name == candidate.migration.source_node {
new_stats.remove_running_resource(&candidate.stats);
} else if node.name == candidate.migration.target_node {
new_stats.add_running_resource(&candidate.stats);
}
new_stats.load()
})
}
}
}
fn score_best_balancing_migration_candidates(
&self,
candidates: &[MigrationCandidate],
limit: usize,
) -> Result<Vec<ScoredMigration>, Error> {
match self {
Self::Topsis(nodes) => {
let len = nodes.len();
let matrix = candidates
.iter()
.map(|candidate| {
let resource_stats = &candidate.stats;
let source_node = &candidate.migration.source_node;
let target_node = &candidate.migration.target_node;
let mut highest_cpu = 0.0;
let mut squares_cpu = 0.0;
let mut highest_mem = 0.0;
let mut squares_mem = 0.0;
for node in nodes.iter() {
let new_stats = {
let mut new_stats = node.stats;
if &node.name == source_node {
new_stats.remove_running_resource(resource_stats);
} else if &node.name == target_node {
new_stats.add_running_resource(resource_stats);
}
new_stats
};
let new_cpu = new_stats.cpu_load();
highest_cpu = f64::max(highest_cpu, new_cpu);
squares_cpu += new_cpu.powi(2);
let new_mem = new_stats.mem_load();
highest_mem = f64::max(highest_mem, new_mem);
squares_mem += new_mem.powi(2);
}
// Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
// 1.004 is only slightly more than 1.002.
PveTopsisAlternative {
average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
highest_cpu: 1.0 + highest_cpu,
average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
highest_memory: 1.0 + highest_mem,
}
.into()
})
.collect::<Vec<_>>();
let best_alternatives = topsis::rank_alternatives(
&topsis::Matrix::new(matrix)?,
&PVE_HA_TOPSIS_CRITERIA,
)?;
Ok(best_alternatives
.into_iter()
.take(limit)
.map(|i| {
let imbalance =
self.node_imbalance_with_migration_candidate(&candidates[i]);
ScoredMigration::new(candidates[i].clone(), imbalance)
})
.collect())
}
Self::BruteForce(_) => {
let mut scored_migrations = candidates
.iter()
.map(|candidate| {
let imbalance = self.node_imbalance_with_migration_candidate(candidate);
// NOTE: could avoid clone if Migration had additional score field
Reverse(ScoredMigration::new(candidate.clone(), imbalance))
})
.collect::<BinaryHeap<_>>();
let mut best_migrations = Vec::with_capacity(limit);
// BinaryHeap::into_iter_sorted() is still in nightly unfortunately
while best_migrations.len() < limit {
match scored_migrations.pop() {
Some(Reverse(alternative)) => best_migrations.push(alternative),
None => break,
}
}
Ok(best_migrations)
}
}
}
}
pub fn score_best_balancing_migration_candidates(
scheduler: &impl Decide,
candidates: &[MigrationCandidate],
limit: usize,
) -> Result<Vec<ScoredMigration>, Error> {
scheduler.score_best_balancing_migration_candidates(candidates, limit)
}
In a nutshell, this declares what a scheduler ought to be able to do to be used
for scoring (that is, implementing the `Decide` trait); and implements
all the functionality for all the variants in one place.
Nice side-effects of this design:
* one scoring function implements all the variants in one place, which is nice, I think
* adding/removing a scheduler variant would become more systematic
* modifying scheduler variants in terms of how they score or, for example, how they measure
imbalance, would also be more straightforward
Again, just my two cents.
> +impl Scheduler {
> + /// Instantiate scheduler instance from node usages.
> + pub fn from_nodes<I>(nodes: I) -> Self
> + where
> + I: IntoIterator<Item: Into<NodeUsage>>,
> + {
> + Self {
> + nodes: nodes.into_iter().map(|node| node.into()).collect(),
> + }
> + }
> +
> + /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
> + ///
> + /// The scoring is done as if the resource is already started on each node. This assumes that
> + /// the already started resource consumes the maximum amount of each stat according to its
> + /// `resource_stats`.
> + ///
> + /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
> + /// score is better.
> + pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
> + &self,
> + resource_stats: T,
> + ) -> Result<Vec<(String, f64)>, Error> {
> + let len = self.nodes.len();
> + let resource_stats = resource_stats.into();
> +
> + let matrix = self
> + .nodes
> + .iter()
> + .enumerate()
> + .map(|(target_index, _)| {
> + // Base values on percentages to allow comparing nodes with different stats.
> + let mut highest_cpu = 0.0;
> + let mut squares_cpu = 0.0;
> + let mut highest_mem = 0.0;
> + let mut squares_mem = 0.0;
> +
> + for (index, node) in self.nodes.iter().enumerate() {
> + let mut new_stats = node.stats;
> +
> + if index == target_index {
> + new_stats.add_started_resource(&resource_stats)
> + };
> +
> + let new_cpu = new_stats.cpu_load();
> + highest_cpu = f64::max(highest_cpu, new_cpu);
> + squares_cpu += new_cpu.powi(2);
> +
> + let new_mem = new_stats.mem_load();
> + highest_mem = f64::max(highest_mem, new_mem);
> + squares_mem += new_mem.powi(2);
> + }
> +
> + // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
> + // 1.004 is only slightly more than 1.002.
> + PveTopsisAlternative {
> + average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
> + highest_cpu: 1.0 + highest_cpu,
> + average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
> + highest_memory: 1.0 + highest_mem,
> + }
> + .into()
> + })
> + .collect::<Vec<_>>();
> +
> + let scores =
> + topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
> +
> + Ok(scores
> + .into_iter()
> + .enumerate()
> + .map(|(n, score)| (self.nodes[n].name.to_string(), score))
> + .collect())
> + }
> }
[snip]
in the future, the proptest crate could come in handy for such helper
functions.
in general, I would propose to add proptests in the future.
for some examples and inspiration, see [0].
[0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox.com/T/
> +fn new_homogeneous_cluster_scheduler() -> Scheduler {
> + let (maxcpu, maxmem) = (16, 64 * (1 << 30));
> +
> + let node1 = NodeUsage {
> + name: String::from("node1"),
> + stats: NodeStats {
> + cpu: 1.7,
> + maxcpu,
> + mem: 12334 << 20,
> + maxmem,
> + },
> + };
> +
> + let node2 = NodeUsage {
> + name: String::from("node2"),
> + stats: NodeStats {
> + cpu: 15.184,
> + maxcpu,
> + mem: 529 << 20,
> + maxmem,
> + },
> + };
> +
> + let node3 = NodeUsage {
> + name: String::from("node3"),
> + stats: NodeStats {
> + cpu: 5.2,
> + maxcpu,
> + mem: 9381 << 20,
> + maxmem,
> + },
> + };
> +
> + Scheduler::from_nodes(vec![node1, node2, node3])
> +}
> +
> +fn new_heterogeneous_cluster_scheduler() -> Scheduler {
> + let node1 = NodeUsage {
> + name: String::from("node1"),
> + stats: NodeStats {
> + cpu: 1.7,
> + maxcpu: 16,
> + mem: 12334 << 20,
> + maxmem: 128 << 30,
> + },
> + };
> +
> + let node2 = NodeUsage {
> + name: String::from("node2"),
> + stats: NodeStats {
> + cpu: 15.184,
> + maxcpu: 32,
> + mem: 529 << 20,
> + maxmem: 96 << 30,
> + },
> + };
> +
> + let node3 = NodeUsage {
> + name: String::from("node3"),
> + stats: NodeStats {
> + cpu: 5.2,
> + maxcpu: 24,
> + mem: 9381 << 20,
> + maxmem: 64 << 30,
> + },
> + };
> +
> + Scheduler::from_nodes(vec![node1, node2, node3])
> +}
[snip]
> +#[test]
> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
> + let scheduler = new_homogeneous_cluster_scheduler();
> +
> + let heavy_memory_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 1.0,
> + mem: 0,
> + maxmem: 12 << 30,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
> + vec!["node2", "node3", "node1"]
> + );
> +
> + let heavy_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 12.0,
> + mem: 0,
> + maxmem: 0,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
> + vec!["node1", "node3", "node2"]
> + );
> +
> + let unlimited_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 0.0,
> + mem: 0,
> + maxmem: 0,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> + vec!["node1", "node3", "node2"]
> + );
> +
nit: confusing variable name
> + let unlimited_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 12.0,
> + mem: 0,
> + maxmem: 12 << 30,
> + };
> +
> + assert_eq!(
nit: confusing variable name
> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> + vec!["node2", "node3", "node1"]
> + );
> +
> + Ok(())
> +}
> +
> +#[test]
> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
> + let scheduler = new_heterogeneous_cluster_scheduler();
> +
> + let heavy_memory_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 1.0,
> + mem: 0,
> + maxmem: 12 << 30,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
> + vec!["node2", "node1", "node3"]
> + );
> +
> + let heavy_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 12.0,
> + mem: 0,
> + maxmem: 0,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
> + vec!["node3", "node2", "node1"]
> + );
> +
> + let unlimited_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 0.0,
> + mem: 0,
> + maxmem: 0,
> + };
> +
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> + vec!["node1", "node3", "node2"]
> + );
> +
nit: confusing variable name
> + let unlimited_cpu_resource_stats = ResourceStats {
> + cpu: 0.0,
> + maxcpu: 12.0,
> + mem: 0,
> + maxmem: 12 << 30,
> + };
> +
nit: confusing variable name
> + assert_eq!(
> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
> + vec!["node2", "node1", "node3"]
> + );
> +
> + Ok(())
> +}
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation
2026-03-26 10:19 ` Dominik Rusovac
@ 2026-03-26 14:16 ` Daniel Kral
0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:16 UTC (permalink / raw)
To: Dominik Rusovac, pve-devel
On Thu Mar 26, 2026 at 11:19 AM CET, Dominik Rusovac wrote:
> [snip]
>
> good to have:
>
> #[derive(Clone, Debug)]
ACK, thanks!
>
>> +pub struct Scheduler {
>> + nodes: Vec<NodeUsage>,
>> +}
>> +
>
> nit: The implementation of `Scheduler` is totally fine as-is. This is
> just my two cents, as this was mentioned off-list.
>
> I believe that for the implementation of the scheduler working with
> enum variants and a trait and then exploiting static dispatch is more
> convenient and easier to maintain, e.g.:
>
> pub enum Schedulerr<Nodes> {
> Topsis(Nodes),
> BruteForce(Nodes),
> }
>
> pub trait Decide {
> fn node_imbalance(&self) -> f64;
>
> fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64;
>
> fn score_best_balancing_migration_candidates(
> &self,
> candidates: &[MigrationCandidate],
> limit: usize,
> ) -> Result<Vec<ScoredMigration>, Error>;
> }
>
> impl Decide for Schedulerr<Vec<NodeUsage>> {
[...]
> }
>
> pub fn score_best_balancing_migration_candidates(
> scheduler: &impl Decide,
> candidates: &[MigrationCandidate],
> limit: usize,
> ) -> Result<Vec<ScoredMigration>, Error> {
> scheduler.score_best_balancing_migration_candidates(candidates, limit)
> }
>
> In a nutshell, this declares what a scheduler ought to be able to do to be used
> for scoring (that is, implementing the `Decide` trait); and implements
> all the functionality for all the variants in one place.
>
> Nice side-effects of this design:
> * one scoring function implements all the variants in one place, which is nice, I think
> * adding/removing a scheduler variant would become more systematic
> * modifying scheduler variants in terms of how they score or, for example, how they measure
> imbalance, would also be more straightforward
>
> Again, just my two cents.
As discussed off-list, I like how readable the different method
alternatives are with the pattern matching and that makes maintenance
and reading diffs easier.
Though I'm not sure whether it is a good idea to define the `Scheduler`
by the algorithms that one or more of their methods use. Choosing one
algorithm might only be something we want in the short-time, e.g.,
whether TOPSIS or bruteforce is the right fit for users, and might drop
bruteforce or TOPSIS for these methods in the future or might add
another method, which uses another new algorithm.
Then it might be better that the methods themselves, such as
score_nodes_to_start_resource() and
score_best_balancing_migration_candidates() defines which algorithm
these use. In that case it is only coincidental that two methods use the
same algorithm internally.
Maybe we could still go for some parameter that allows passing the
preferred algorithm for score_best_balancing_migration_candidates(),
which now can either be Bruteforce or Topsis for the latter?
Still, thanks a lot for the suggestion! I like the idea a lot, but I'm
only a little unsure whether it is the right fit in this situation wrt.
to future changes.
>
>> +impl Scheduler {
>> + /// Instantiate scheduler instance from node usages.
>> + pub fn from_nodes<I>(nodes: I) -> Self
>> + where
>> + I: IntoIterator<Item: Into<NodeUsage>>,
>> + {
>> + Self {
>> + nodes: nodes.into_iter().map(|node| node.into()).collect(),
>> + }
>> + }
>> +
>> + /// Scores nodes to start a resource with the usage statistics `resource_stats` on.
>> + ///
>> + /// The scoring is done as if the resource is already started on each node. This assumes that
>> + /// the already started resource consumes the maximum amount of each stat according to its
>> + /// `resource_stats`.
>> + ///
>> + /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
>> + /// score is better.
>> + pub fn score_nodes_to_start_resource<T: Into<ResourceStats>>(
>> + &self,
>> + resource_stats: T,
>> + ) -> Result<Vec<(String, f64)>, Error> {
>> + let len = self.nodes.len();
>> + let resource_stats = resource_stats.into();
>> +
>> + let matrix = self
>> + .nodes
>> + .iter()
>> + .enumerate()
>> + .map(|(target_index, _)| {
>> + // Base values on percentages to allow comparing nodes with different stats.
>> + let mut highest_cpu = 0.0;
>> + let mut squares_cpu = 0.0;
>> + let mut highest_mem = 0.0;
>> + let mut squares_mem = 0.0;
>> +
>> + for (index, node) in self.nodes.iter().enumerate() {
>> + let mut new_stats = node.stats;
>> +
>> + if index == target_index {
>> + new_stats.add_started_resource(&resource_stats)
>> + };
>> +
>> + let new_cpu = new_stats.cpu_load();
>> + highest_cpu = f64::max(highest_cpu, new_cpu);
>> + squares_cpu += new_cpu.powi(2);
>> +
>> + let new_mem = new_stats.mem_load();
>> + highest_mem = f64::max(highest_mem, new_mem);
>> + squares_mem += new_mem.powi(2);
>> + }
>> +
>> + // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
>> + // 1.004 is only slightly more than 1.002.
>> + PveTopsisAlternative {
>> + average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
>> + highest_cpu: 1.0 + highest_cpu,
>> + average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
>> + highest_memory: 1.0 + highest_mem,
>> + }
>> + .into()
>> + })
>> + .collect::<Vec<_>>();
>> +
>> + let scores =
>> + topsis::score_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
>> +
>> + Ok(scores
>> + .into_iter()
>> + .enumerate()
>> + .map(|(n, score)| (self.nodes[n].name.to_string(), score))
>> + .collect())
>> + }
>> }
>
> [snip]
>
> in the future, the proptest crate could come in handy for such helper
> functions.
>
> in general, I would propose to add proptests in the future.
>
> for some examples and inspiration, see [0].
>
> [0] https://lore.proxmox.com/all/20260306082046.34311-1-d.rusovac@proxmox.com/T/
+1
>
>> +fn new_homogeneous_cluster_scheduler() -> Scheduler {
>> + let (maxcpu, maxmem) = (16, 64 * (1 << 30));
>> +
>> + let node1 = NodeUsage {
>> + name: String::from("node1"),
>> + stats: NodeStats {
>> + cpu: 1.7,
>> + maxcpu,
>> + mem: 12334 << 20,
>> + maxmem,
>> + },
>> + };
>> +
>> + let node2 = NodeUsage {
>> + name: String::from("node2"),
>> + stats: NodeStats {
>> + cpu: 15.184,
>> + maxcpu,
>> + mem: 529 << 20,
>> + maxmem,
>> + },
>> + };
>> +
>> + let node3 = NodeUsage {
>> + name: String::from("node3"),
>> + stats: NodeStats {
>> + cpu: 5.2,
>> + maxcpu,
>> + mem: 9381 << 20,
>> + maxmem,
>> + },
>> + };
>> +
>> + Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>> +
>> +fn new_heterogeneous_cluster_scheduler() -> Scheduler {
>> + let node1 = NodeUsage {
>> + name: String::from("node1"),
>> + stats: NodeStats {
>> + cpu: 1.7,
>> + maxcpu: 16,
>> + mem: 12334 << 20,
>> + maxmem: 128 << 30,
>> + },
>> + };
>> +
>> + let node2 = NodeUsage {
>> + name: String::from("node2"),
>> + stats: NodeStats {
>> + cpu: 15.184,
>> + maxcpu: 32,
>> + mem: 529 << 20,
>> + maxmem: 96 << 30,
>> + },
>> + };
>> +
>> + let node3 = NodeUsage {
>> + name: String::from("node3"),
>> + stats: NodeStats {
>> + cpu: 5.2,
>> + maxcpu: 24,
>> + mem: 9381 << 20,
>> + maxmem: 64 << 30,
>> + },
>> + };
>> +
>> + Scheduler::from_nodes(vec![node1, node2, node3])
>> +}
>
> [snip]
>
>> +#[test]
>> +fn test_score_homogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> + let scheduler = new_homogeneous_cluster_scheduler();
>> +
>> + let heavy_memory_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 1.0,
>> + mem: 0,
>> + maxmem: 12 << 30,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> + vec!["node2", "node3", "node1"]
>> + );
>> +
>> + let heavy_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 12.0,
>> + mem: 0,
>> + maxmem: 0,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> + vec!["node1", "node3", "node2"]
>> + );
>> +
>> + let unlimited_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 0.0,
>> + mem: 0,
>> + maxmem: 0,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> + vec!["node1", "node3", "node2"]
>> + );
>> +
>
> nit: confusing variable name
ACK this and the following, that was a copy-paste error unfortunately.
>
>> + let unlimited_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 12.0,
>> + mem: 0,
>> + maxmem: 12 << 30,
>> + };
>> +
>> + assert_eq!(
>
> nit: confusing variable name
ACK
>
>> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> + vec!["node2", "node3", "node1"]
>> + );
>> +
>> + Ok(())
>> +}
>> +
>> +#[test]
>> +fn test_score_heterogeneous_nodes_to_start_resource() -> Result<(), Error> {
>> + let scheduler = new_heterogeneous_cluster_scheduler();
>> +
>> + let heavy_memory_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 1.0,
>> + mem: 0,
>> + maxmem: 12 << 30,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, heavy_memory_resource_stats)?,
>> + vec!["node2", "node1", "node3"]
>> + );
>> +
>> + let heavy_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 12.0,
>> + mem: 0,
>> + maxmem: 0,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, heavy_cpu_resource_stats)?,
>> + vec!["node3", "node2", "node1"]
>> + );
>> +
>> + let unlimited_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 0.0,
>> + mem: 0,
>> + maxmem: 0,
>> + };
>> +
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> + vec!["node1", "node3", "node2"]
>> + );
>> +
>
> nit: confusing variable name
>
ACK
>> + let unlimited_cpu_resource_stats = ResourceStats {
>> + cpu: 0.0,
>> + maxcpu: 12.0,
>> + mem: 0,
>> + maxmem: 12 << 30,
>> + };
>> +
>
> nit: confusing variable name
>
ACK
>> + assert_eq!(
>> + rank_nodes_to_start_resource(&scheduler, unlimited_cpu_resource_stats)?,
>> + vec!["node2", "node1", "node3"]
>> + );
>> +
>> + Ok(())
>> +}
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (3 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:28 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
` (34 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
This is a more generic version of the `Usage` implementation from the
pve_static bindings in the pve_rs repository.
As the upcoming load balancing scheduler actions and dynamic resource
scheduler will need more information about each resource, this further
improves on the state tracking of each resource:
In this implementation, a resource is composed of its usage statistics
and its two essential states: the running state and the node placement.
The non_exhaustive attribute ensures that usages need to construct the
a Resource instance through its API.
Users can repeatedly use the current state of Usage to make scheduling
decisions with the to_scheduler() method. This method takes an
implementation of UsageAggregator, which dictates how the usage
information is represented to the Scheduler.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
This patch is added to move the handling of specific usage stats and
their (de)serialization to the pve-rs bindings and have the general
functionality in this crate.
proxmox-resource-scheduling/src/lib.rs | 1 +
proxmox-resource-scheduling/src/node.rs | 40 +++++
proxmox-resource-scheduling/src/resource.rs | 119 +++++++++++++
proxmox-resource-scheduling/src/usage.rs | 183 ++++++++++++++++++++
proxmox-resource-scheduling/tests/usage.rs | 153 ++++++++++++++++
5 files changed, 496 insertions(+)
create mode 100644 proxmox-resource-scheduling/src/usage.rs
create mode 100644 proxmox-resource-scheduling/tests/usage.rs
diff --git a/proxmox-resource-scheduling/src/lib.rs b/proxmox-resource-scheduling/src/lib.rs
index 12b743fe..99ca16d8 100644
--- a/proxmox-resource-scheduling/src/lib.rs
+++ b/proxmox-resource-scheduling/src/lib.rs
@@ -3,6 +3,7 @@ pub mod topsis;
pub mod node;
pub mod resource;
+pub mod usage;
pub mod scheduler;
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index e6227eda..be462782 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -1,3 +1,5 @@
+use std::collections::HashSet;
+
use crate::resource::ResourceStats;
/// Usage statistics of a node.
@@ -37,3 +39,41 @@ impl NodeStats {
self.mem as f64 / self.maxmem as f64
}
}
+
+/// A node in the cluster context.
+#[derive(Clone, Debug)]
+pub struct Node {
+ /// Base stats of the node.
+ stats: NodeStats,
+ /// The identifiers of the resources assigned to the node.
+ resources: HashSet<String>,
+}
+
+impl Node {
+ pub fn new(stats: NodeStats) -> Self {
+ Self {
+ stats,
+ resources: HashSet::new(),
+ }
+ }
+
+ pub fn add_resource(&mut self, sid: &str) -> bool {
+ self.resources.insert(sid.to_string())
+ }
+
+ pub fn remove_resource(&mut self, sid: &str) -> bool {
+ self.resources.remove(sid)
+ }
+
+ pub fn stats(&self) -> NodeStats {
+ self.stats
+ }
+
+ pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
+ self.resources.iter()
+ }
+
+ pub fn contains_resource(&self, sid: &str) -> bool {
+ self.resources.contains(sid)
+ }
+}
diff --git a/proxmox-resource-scheduling/src/resource.rs b/proxmox-resource-scheduling/src/resource.rs
index 1eb9d15e..2aa16a51 100644
--- a/proxmox-resource-scheduling/src/resource.rs
+++ b/proxmox-resource-scheduling/src/resource.rs
@@ -1,5 +1,7 @@
use std::{iter::Sum, ops::Add};
+use anyhow::{bail, Error};
+
/// Usage statistics for a resource.
#[derive(Copy, Clone, PartialEq, PartialOrd, Debug, Default)]
pub struct ResourceStats {
@@ -31,3 +33,120 @@ impl Sum for ResourceStats {
iter.fold(Self::default(), |a, b| a + b)
}
}
+
+/// Execution state of a resource.
+#[derive(Copy, Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourceState {
+ /// The resource is stopped.
+ Stopped,
+ /// The resource is scheduled to start.
+ Starting,
+ /// The resource is started and currently running.
+ Started,
+}
+
+/// Placement of a resource.
+#[derive(Clone, PartialEq, Eq, Debug)]
+#[non_exhaustive]
+pub enum ResourcePlacement {
+ /// The resource is on `current_node`.
+ Stationary { current_node: String },
+ /// The resource is being moved from `current_node` to `target_node`.
+ Moving {
+ current_node: String,
+ target_node: String,
+ },
+}
+
+impl ResourcePlacement {
+ fn nodenames(&self) -> Vec<&str> {
+ match self {
+ ResourcePlacement::Stationary { current_node } => vec![¤t_node],
+ ResourcePlacement::Moving {
+ current_node,
+ target_node,
+ } => vec![¤t_node, &target_node],
+ }
+ }
+}
+
+/// A resource in the cluster context.
+#[derive(Clone, Debug)]
+#[non_exhaustive]
+pub struct Resource {
+ /// The usage statistics of the resource.
+ stats: ResourceStats,
+ /// The execution state of the resource.
+ state: ResourceState,
+ /// The placement of the resource.
+ placement: ResourcePlacement,
+}
+
+impl Resource {
+ pub fn new(stats: ResourceStats, state: ResourceState, placement: ResourcePlacement) -> Self {
+ Self {
+ stats,
+ state,
+ placement,
+ }
+ }
+
+ /// Put the resource into a moving state with `target_node`.
+ ///
+ /// This method fails if the resource is already moving.
+ pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
+ match &self.placement {
+ ResourcePlacement::Stationary { current_node } => {
+ self.placement = ResourcePlacement::Moving {
+ current_node: current_node.to_string(),
+ target_node,
+ };
+ }
+ ResourcePlacement::Moving { .. } => bail!("resource is already moving"),
+ };
+
+ Ok(())
+ }
+
+ /// Handles the external removal of a node.
+ ///
+ /// Returns whether the resource does not have any node left.
+ pub fn remove_node(&mut self, nodename: &str) -> bool {
+ match &self.placement {
+ ResourcePlacement::Stationary { current_node } => current_node == nodename,
+ ResourcePlacement::Moving {
+ current_node,
+ target_node,
+ } => {
+ if current_node == nodename {
+ self.placement = ResourcePlacement::Stationary {
+ current_node: target_node.to_string(),
+ };
+ } else if target_node == nodename {
+ self.placement = ResourcePlacement::Stationary {
+ current_node: current_node.to_string(),
+ };
+ }
+
+ false
+ }
+ }
+ }
+
+ pub fn state(&self) -> ResourceState {
+ self.state
+ }
+
+ pub fn stats(&self) -> ResourceStats {
+ self.stats
+ }
+
+ pub fn placement(&self) -> &ResourcePlacement {
+ &self.placement
+ }
+
+ pub fn nodenames(&self) -> Vec<&str> {
+ self.placement.nodenames()
+ }
+}
diff --git a/proxmox-resource-scheduling/src/usage.rs b/proxmox-resource-scheduling/src/usage.rs
new file mode 100644
index 00000000..78ccc453
--- /dev/null
+++ b/proxmox-resource-scheduling/src/usage.rs
@@ -0,0 +1,183 @@
+use anyhow::{bail, Error};
+
+use std::collections::HashMap;
+
+use crate::{
+ node::{Node, NodeStats},
+ resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+ scheduler::{NodeUsage, Scheduler},
+};
+
+/// The state of the usage in the cluster.
+///
+/// The cluster usage represents the current state of the assignments between nodes and resources
+/// and their usage statistics. A resource can be placed on these nodes according to their
+/// placement state. See [`crate::resource::Resource`] for more information.
+///
+/// The cluster usage state can be used to build a current state for the [`Scheduler`].
+#[derive(Default)]
+pub struct Usage {
+ nodes: HashMap<String, Node>,
+ resources: HashMap<String, Resource>,
+}
+
+/// An aggregator for the [`Usage`] maps the cluster usage to node usage statistics that are
+/// relevant for the scheduler.
+pub trait UsageAggregator {
+ fn aggregate(usage: &Usage) -> Vec<NodeUsage>;
+}
+
+impl Usage {
+ /// Instantiate an empty cluster usage.
+ pub fn new() -> Self {
+ Self::default()
+ }
+
+ /// Add a node to the cluster usage.
+ ///
+ /// This method fails if a node with the same `nodename` already exists.
+ pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
+ if self.nodes.contains_key(&nodename) {
+ bail!("node '{}' already exists", nodename);
+ }
+
+ self.nodes.insert(nodename, Node::new(stats));
+
+ Ok(())
+ }
+
+ /// Remove a node from the cluster usage.
+ pub fn remove_node(&mut self, nodename: &str) {
+ if let Some(node) = self.nodes.remove(nodename) {
+ node.resources_iter().for_each(|sid| {
+ if let Some(resource) = self.resources.get_mut(sid)
+ && resource.remove_node(nodename)
+ {
+ self.resources.remove(sid);
+ }
+ });
+ }
+ }
+
+ /// Returns a reference to the [`Node`] with the identifier `nodename`.
+ pub fn get_node(&self, nodename: &str) -> Option<&Node> {
+ self.nodes.get(nodename)
+ }
+
+ /// Returns an iterator for the cluster usage's nodes.
+ pub fn nodes_iter(&self) -> impl Iterator<Item = (&String, &Node)> {
+ self.nodes.iter()
+ }
+
+ /// Returns an iterator for the cluster usage's nodes.
+ pub fn nodenames_iter(&self) -> impl Iterator<Item = &String> {
+ self.nodes.keys()
+ }
+
+ /// Returns whether the node with the identifier `nodename` is present in the cluster usage.
+ pub fn contains_node(&self, nodename: &str) -> bool {
+ self.nodes.contains_key(nodename)
+ }
+
+ fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
+ if nodenames
+ .iter()
+ .any(|nodename| !self.nodes.contains_key(*nodename))
+ {
+ bail!("resource nodes do not exist");
+ }
+
+ nodenames.iter().for_each(|nodename| {
+ if let Some(node) = self.nodes.get_mut(*nodename) {
+ node.add_resource(sid);
+ }
+ });
+
+ Ok(())
+ }
+
+ fn remove_resource_from_nodes(&mut self, sid: &str, nodenames: &[&str]) {
+ nodenames.iter().for_each(|nodename| {
+ if let Some(node) = self.nodes.get_mut(*nodename) {
+ node.remove_resource(sid);
+ }
+ });
+ }
+
+ /// Add `resource` with identifier `sid` to cluster usage.
+ ///
+ /// This method fails if a resource with the same `sid` already exists or the resource's nodes
+ /// do not exist in the cluster usage.
+ pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
+ if self.resources.contains_key(&sid) {
+ bail!("resource '{}' already exists", sid);
+ }
+
+ self.add_resource_to_nodes(&sid, resource.nodenames())?;
+
+ self.resources.insert(sid.to_string(), resource);
+
+ Ok(())
+ }
+
+ /// Add `stats` from resource with identifier `sid` to node `nodename` in cluster usage.
+ ///
+ /// For the first call, the resource is assumed to be started and stationary on the given node.
+ /// If there was no intermediate call to remove the resource, the second call will assume that
+ /// the given node is the target node and the resource is being moved there. The second call
+ /// will ignore the value of `stats`.
+ #[deprecated = "only for backwards compatibility, use add_resource(...) instead"]
+ pub fn add_resource_usage_to_node(
+ &mut self,
+ nodename: &str,
+ sid: &str,
+ stats: ResourceStats,
+ ) -> Result<(), Error> {
+ if let Some(resource) = self.resources.get_mut(sid) {
+ resource.moving_to(nodename.to_string())?;
+
+ self.add_resource_to_nodes(sid, vec![nodename])
+ } else {
+ let placement = ResourcePlacement::Stationary {
+ current_node: nodename.to_string(),
+ };
+ let resource = Resource::new(stats, ResourceState::Started, placement);
+
+ self.add_resource(sid.to_string(), resource)
+ }
+ }
+
+ /// Remove resource with identifier `sid` from cluster usage.
+ pub fn remove_resource(&mut self, sid: &str) {
+ if let Some(resource) = self.resources.remove(sid) {
+ match resource.placement() {
+ ResourcePlacement::Stationary { current_node } => {
+ self.remove_resource_from_nodes(sid, &[current_node]);
+ }
+ ResourcePlacement::Moving {
+ current_node,
+ target_node,
+ } => {
+ self.remove_resource_from_nodes(sid, &[current_node, target_node]);
+ }
+ }
+ }
+ }
+
+ /// Returns a reference to the [`Resource`] with the identifier `sid`.
+ pub fn get_resource(&self, sid: &str) -> Option<&Resource> {
+ self.resources.get(sid)
+ }
+
+ /// Returns an iterator for the cluster usage's resources.
+ pub fn resources_iter(&self) -> impl Iterator<Item = (&String, &Resource)> {
+ self.resources.iter()
+ }
+
+ /// Use the current cluster usage as a base for a scheduling action.
+ pub fn to_scheduler<F: UsageAggregator>(&self) -> Scheduler {
+ let node_usages = F::aggregate(self);
+
+ Scheduler::from_nodes(node_usages)
+ }
+}
diff --git a/proxmox-resource-scheduling/tests/usage.rs b/proxmox-resource-scheduling/tests/usage.rs
new file mode 100644
index 00000000..eb00d2c6
--- /dev/null
+++ b/proxmox-resource-scheduling/tests/usage.rs
@@ -0,0 +1,153 @@
+use anyhow::{bail, Error};
+use proxmox_resource_scheduling::{
+ node::NodeStats,
+ resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+ usage::Usage,
+};
+
+#[test]
+fn test_no_duplicate_nodes() -> Result<(), Error> {
+ let mut usage = Usage::new();
+
+ usage.add_node("node1".to_string(), NodeStats::default())?;
+
+ match usage.add_node("node1".to_string(), NodeStats::default()) {
+ Ok(_) => bail!("cluster usage does allow duplicate node entries"),
+ Err(_) => Ok(()),
+ }
+}
+
+#[test]
+fn test_no_duplicate_resources() -> Result<(), Error> {
+ let mut usage = Usage::new();
+
+ usage.add_node("node1".to_string(), NodeStats::default())?;
+
+ let placement = ResourcePlacement::Stationary {
+ current_node: "node1".to_string(),
+ };
+ let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+ usage.add_resource("vm:101".to_string(), resource.clone())?;
+
+ match usage.add_resource("vm:101".to_string(), resource) {
+ Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
+ Err(_) => Ok(()),
+ }
+}
+
+#[test]
+#[allow(deprecated)]
+fn test_add_resource_usage_to_node() -> Result<(), Error> {
+ let mut usage = Usage::new();
+
+ usage.add_node("node1".to_string(), NodeStats::default())?;
+ usage.add_node("node2".to_string(), NodeStats::default())?;
+ usage.add_node("node3".to_string(), NodeStats::default())?;
+
+ usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
+ usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
+
+ if usage
+ .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
+ .is_ok()
+ {
+ bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
+ }
+
+ Ok(())
+}
+
+#[test]
+fn test_add_remove_stationary_resource() -> Result<(), Error> {
+ let mut usage = Usage::new();
+
+ let (sid, nodename) = ("vm:101", "node1");
+
+ usage.add_node(nodename.to_string(), NodeStats::default())?;
+
+ let placement = ResourcePlacement::Stationary {
+ current_node: nodename.to_string(),
+ };
+ let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+ usage.add_resource(sid.to_string(), resource)?;
+
+ match (usage.get_resource(sid), usage.get_node(nodename)) {
+ (Some(_), Some(node)) => {
+ if !node.contains_resource(sid) {
+ bail!("resource '{sid}' was not added to node '{nodename}'");
+ }
+ }
+ _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
+ }
+
+ usage.remove_resource(sid);
+
+ match (usage.get_resource(sid), usage.get_node(nodename)) {
+ (None, Some(node)) => {
+ if node.contains_resource(sid) {
+ bail!("resource '{sid}' was not removed from node '{nodename}'");
+ }
+ }
+ _ => bail!("resource '{sid}' was not removed"),
+ }
+
+ Ok(())
+}
+
+#[test]
+fn test_add_remove_moving_resource() -> Result<(), Error> {
+ let mut usage = Usage::new();
+
+ let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
+
+ usage.add_node(current_nodename.to_string(), NodeStats::default())?;
+ usage.add_node(target_nodename.to_string(), NodeStats::default())?;
+
+ let placement = ResourcePlacement::Moving {
+ current_node: current_nodename.to_string(),
+ target_node: target_nodename.to_string(),
+ };
+ let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
+
+ usage.add_resource(sid.to_string(), resource)?;
+
+ match (
+ usage.get_resource(sid),
+ usage.get_node(current_nodename),
+ usage.get_node(target_nodename),
+ ) {
+ (Some(_), Some(current_node), Some(target_node)) => {
+ if !current_node.contains_resource("vm:101") {
+ bail!("resource '{sid}' was not added to current node '{current_nodename}'");
+ }
+
+ if !target_node.contains_resource("vm:101") {
+ bail!("resource '{sid}' was not added to target node '{target_nodename}'");
+ }
+ }
+ _ => bail!("resource '{sid}' or nodes were not added"),
+ }
+
+ usage.remove_resource(sid);
+
+ match (
+ usage.get_resource(sid),
+ usage.get_node(current_nodename),
+ usage.get_node(target_nodename),
+ ) {
+ (None, Some(current_node), Some(target_node)) => {
+ if current_node.contains_resource(sid) {
+ bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
+ }
+
+ if target_node.contains_resource(sid) {
+ bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
+ }
+ }
+ _ => bail!("resource '{sid}' was not removed"),
+ }
+
+ Ok(())
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-26 10:28 ` Dominik Rusovac
2026-03-26 14:15 ` Daniel Kral
0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:28 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
pls find my comments inline, mostly relating to nits or tiny things
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> This patch is added to move the handling of specific usage stats and
> their (de)serialization to the pve-rs bindings and have the general
> functionality in this crate.
[snip]
nit: imo, it's more convenient to expose the more ergonomic `&str` type,
using:
pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
self.resources.iter().map(String::as_str)
}
> + pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
> + self.resources.iter()
> + }
[snip]
> + pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
> + match &self.placement {
> + ResourcePlacement::Stationary { current_node } => {
> + self.placement = ResourcePlacement::Moving {
> + current_node: current_node.to_string(),
nit:
current_node: current_node.to_owned(),
represents the intention best, that is, owning rather than converting
[snip]
> + /// Handles the external removal of a node.
> + ///
> + /// Returns whether the resource does not have any node left.
Considering what it does, I find the name of this function a bit confusing.
> + pub fn remove_node(&mut self, nodename: &str) -> bool {
> + match &self.placement {
> + ResourcePlacement::Stationary { current_node } => current_node == nodename,
> + ResourcePlacement::Moving {
> + current_node,
> + target_node,
> + } => {
> + if current_node == nodename {
> + self.placement = ResourcePlacement::Stationary {
> + current_node: target_node.to_string(),
nit: to_owned() represents the intention best
> + };
> + } else if target_node == nodename {
> + self.placement = ResourcePlacement::Stationary {
> + current_node: current_node.to_string(),
nit: to_owned() represents the intention best
> + };
> + }
> +
> + false
> + }
> + }
> + }
[snip]
> + /// Add a node to the cluster usage.
> + ///
> + /// This method fails if a node with the same `nodename` already exists.
> + pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
> + if self.nodes.contains_key(&nodename) {
> + bail!("node '{}' already exists", nodename);
nit:
bail!("node '{nodename}' already exists");
> + }
[snip]
we are reading only, consider using a slice for `nodenames` here (just
like for `remove_resource_from_nodes`):
fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {
pls find the related changes [0] and [1].
> + fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
> + if nodenames
> + .iter()
> + .any(|nodename| !self.nodes.contains_key(*nodename))
> + {
> + bail!("resource nodes do not exist");
> + }
> +
> + nodenames.iter().for_each(|nodename| {
> + if let Some(node) = self.nodes.get_mut(*nodename) {
> + node.add_resource(sid);
> + }
> + });
> +
> + Ok(())
> + }
[snip]
> + /// Add `resource` with identifier `sid` to cluster usage.
> + ///
> + /// This method fails if a resource with the same `sid` already exists or the resource's nodes
> + /// do not exist in the cluster usage.
> + pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
> + if self.resources.contains_key(&sid) {
> + bail!("resource '{}' already exists", sid);
> + }
> +
> + self.add_resource_to_nodes(&sid, resource.nodenames())?;
[0]:
self.add_resource_to_nodes(&sid, &resource.nodenames())?;
> +
> + self.resources.insert(sid.to_string(), resource);
nit: to_owned() instead of of to_string() represents the intention best
[snip]
> + pub fn add_resource_usage_to_node(
> + &mut self,
> + nodename: &str,
> + sid: &str,
> + stats: ResourceStats,
> + ) -> Result<(), Error> {
> + if let Some(resource) = self.resources.get_mut(sid) {
> + resource.moving_to(nodename.to_string())?;
> +
> + self.add_resource_to_nodes(sid, vec![nodename])
[1]:
self.add_resource_to_nodes(sid, &[nodename])
[snip]
> +#[test]
> +fn test_no_duplicate_nodes() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> + match usage.add_node("node1".to_string(), NodeStats::default()) {
> + Ok(_) => bail!("cluster usage does allow duplicate node entries"),
> + Err(_) => Ok(()),
> + }
since this is supposed to be a test case, I would rather assert instead
of bail, using:
assert!(
usage
.add_node("node1".to_string(), NodeStats::default())
.is_err(),
"cluster usage allows duplicate node entries"
);
> +}
> +
> +#[test]
> +fn test_no_duplicate_resources() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Stationary {
> + current_node: "node1".to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource("vm:101".to_string(), resource.clone())?;
> +
> + match usage.add_resource("vm:101".to_string(), resource) {
> + Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
> + Err(_) => Ok(()),
> + }
assert instead of bail:
assert!(
usage.add_resource("vm:101".to_string(), resource).is_err(),
"cluster usage allows duplicate resource entries"
);
> +}
> +
> +#[test]
> +#[allow(deprecated)]
> +fn test_add_resource_usage_to_node() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> + usage.add_node("node2".to_string(), NodeStats::default())?;
> + usage.add_node("node3".to_string(), NodeStats::default())?;
> +
> + usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
> + usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
> +
> + if usage
> + .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
> + .is_ok()
> + {
> + bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
> + }
assert instead of bail:
assert!(
usage
.add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
.is_err(),
"add_resource_usage_to_node() allows adding resource to more than two nodes"
);
> +
> + Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_stationary_resource() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + let (sid, nodename) = ("vm:101", "node1");
> +
> + usage.add_node(nodename.to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Stationary {
> + current_node: nodename.to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource(sid.to_string(), resource)?;
> +
> + match (usage.get_resource(sid), usage.get_node(nodename)) {
> + (Some(_), Some(node)) => {
> + if !node.contains_resource(sid) {
> + bail!("resource '{sid}' was not added to node '{nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
> + }
assert instead of bail:
assert!(
usage.get_resource(sid).is_some(),
"resource '{sid}' was not added"
);
assert!(
usage
.get_node(nodename)
.map(|node| {
assert!(
node.contains_resource(sid),
"resource '{sid}' was not added to node '{nodename}'"
);
})
.is_some(),
"node '{nodename}' was not added"
);
> +
> + usage.remove_resource(sid);
> +
> + match (usage.get_resource(sid), usage.get_node(nodename)) {
> + (None, Some(node)) => {
> + if node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from node '{nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' was not removed"),
> + }
assert instead of bail:
assert!(
usage.get_resource(sid).is_none(),
"resource '{sid}' was not removed"
);
assert!(
usage
.get_node(nodename)
.map(|node| {
assert!(
!node.contains_resource(sid),
"resource '{sid}' was not removed from node '{nodename}'"
);
})
.is_some(),
"node '{nodename}' was not added"
);
> +
> + Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_moving_resource() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
> +
> + usage.add_node(current_nodename.to_string(), NodeStats::default())?;
> + usage.add_node(target_nodename.to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Moving {
> + current_node: current_nodename.to_string(),
> + target_node: target_nodename.to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource(sid.to_string(), resource)?;
> +
analogously, here I'd find asserting more appropriate than bailing
> + match (
> + usage.get_resource(sid),
> + usage.get_node(current_nodename),
> + usage.get_node(target_nodename),
> + ) {
> + (Some(_), Some(current_node), Some(target_node)) => {
> + if !current_node.contains_resource("vm:101") {
> + bail!("resource '{sid}' was not added to current node '{current_nodename}'");
> + }
> +
> + if !target_node.contains_resource("vm:101") {
> + bail!("resource '{sid}' was not added to target node '{target_nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' or nodes were not added"),
> + }
> +
> + usage.remove_resource(sid);
analogously, here I'd find asserting more appropriate than bailing
> +
> + match (
> + usage.get_resource(sid),
> + usage.get_node(current_nodename),
> + usage.get_node(target_nodename),
> + ) {
> + (None, Some(current_node), Some(target_node)) => {
> + if current_node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
> + }
> +
> + if target_node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' was not removed"),
> + }
> +
> + Ok(())
> +}
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
2026-03-26 10:28 ` Dominik Rusovac
@ 2026-03-26 14:15 ` Daniel Kral
0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:15 UTC (permalink / raw)
To: Dominik Rusovac, pve-devel
On Thu Mar 26, 2026 at 11:28 AM CET, Dominik Rusovac wrote:
> lgtm
>
> pls find my comments inline, mostly relating to nits or tiny things
>
> On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
>> This is a more generic version of the `Usage` implementation from the
>> pve_static bindings in the pve_rs repository.
>>
>> As the upcoming load balancing scheduler actions and dynamic resource
>> scheduler will need more information about each resource, this further
>> improves on the state tracking of each resource:
>>
>> In this implementation, a resource is composed of its usage statistics
>> and its two essential states: the running state and the node placement.
>> The non_exhaustive attribute ensures that usages need to construct the
>> a Resource instance through its API.
>>
>> Users can repeatedly use the current state of Usage to make scheduling
>> decisions with the to_scheduler() method. This method takes an
>> implementation of UsageAggregator, which dictates how the usage
>> information is represented to the Scheduler.
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> changes v1 -> v2:
>> - new!
>>
>> This patch is added to move the handling of specific usage stats and
>> their (de)serialization to the pve-rs bindings and have the general
>> functionality in this crate.
>
> [snip]
>
> nit: imo, it's more convenient to expose the more ergonomic `&str` type,
> using:
>
> pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
> self.resources.iter().map(String::as_str)
> }
>
Thanks, will do that!
>> + pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
>> + self.resources.iter()
>> + }
>
> [snip]
>
>> + pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
>> + match &self.placement {
>> + ResourcePlacement::Stationary { current_node } => {
>> + self.placement = ResourcePlacement::Moving {
>> + current_node: current_node.to_string(),
>
> nit:
>
> current_node: current_node.to_owned(),
>
> represents the intention best, that is, owning rather than converting
>
> [snip]
Thanks, will do so for this and the rest!
[...]
>> + /// Add a node to the cluster usage.
>> + ///
>> + /// This method fails if a node with the same `nodename` already exists.
>> + pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
>> + if self.nodes.contains_key(&nodename) {
>> + bail!("node '{}' already exists", nodename);
>
> nit:
>
> bail!("node '{nodename}' already exists");
>
ACK
>> + }
>
> [snip]
>
> we are reading only, consider using a slice for `nodenames` here (just
> like for `remove_resource_from_nodes`):
>
> fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {
>
> pls find the related changes [0] and [1].
>
Right, that makes more sense, will go for that!
[...]
>> +#[test]
>> +fn test_no_duplicate_nodes() -> Result<(), Error> {
>> + let mut usage = Usage::new();
>> +
>> + usage.add_node("node1".to_string(), NodeStats::default())?;
>> +
>> + match usage.add_node("node1".to_string(), NodeStats::default()) {
>> + Ok(_) => bail!("cluster usage does allow duplicate node entries"),
>> + Err(_) => Ok(()),
>> + }
>
> since this is supposed to be a test case, I would rather assert instead
> of bail, using:
>
> assert!(
> usage
> .add_node("node1".to_string(), NodeStats::default())
> .is_err(),
> "cluster usage allows duplicate node entries"
> );
>
Right, that's more appropriate, will do so here and for all the
following, thanks!
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (4 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:29 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
` (33 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
Iterator::min_by(...) and Iterator::max_by(...) do only return `None` if
there are no entries in the `Matrix` column at all. This can only happen
if the `Matrix` doesn't have any row entries.
This will make any call to score_alternatives(...), the only current
user of IdealAlternatives::compute(...), panic if there are no given
alternatives. Therefore use reasonable default values.
This has not happened yet, because the only non-test caller of
score_alternatives(...) is score_nodes_to_start_resource(...), which
always has nodes present in production.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
This can happen with the next patch if
score_best_balancing_migration_candidates() is called with an empty
candidates vec, which is trivially possible for pve-ha-manager in a
cluster with high imbalance, but no configured HA resources or all HA
resources being so constrainted that no migration is possible.
proxmox-resource-scheduling/src/topsis.rs | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/proxmox-resource-scheduling/src/topsis.rs b/proxmox-resource-scheduling/src/topsis.rs
index 6d078aa6..ed5a9bd1 100644
--- a/proxmox-resource-scheduling/src/topsis.rs
+++ b/proxmox-resource-scheduling/src/topsis.rs
@@ -145,8 +145,10 @@ impl<const N: usize> IdealAlternatives<N> {
let min = fixed_criterion
.clone()
.min_by(|a, b| a.total_cmp(b))
- .unwrap();
- let max = fixed_criterion.max_by(|a, b| a.total_cmp(b)).unwrap();
+ .unwrap_or(f64::NEG_INFINITY);
+ let max = fixed_criterion
+ .max_by(|a, b| a.total_cmp(b))
+ .unwrap_or(f64::INFINITY);
(best[n], worst[n]) = match criteria[n].maximize {
true => (max, min),
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
@ 2026-03-26 10:29 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:29 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
[snip]
> diff --git a/proxmox-resource-scheduling/src/topsis.rs b/proxmox-resource-scheduling/src/topsis.rs
> index 6d078aa6..ed5a9bd1 100644
> --- a/proxmox-resource-scheduling/src/topsis.rs
> +++ b/proxmox-resource-scheduling/src/topsis.rs
> @@ -145,8 +145,10 @@ impl<const N: usize> IdealAlternatives<N> {
> let min = fixed_criterion
> .clone()
> .min_by(|a, b| a.total_cmp(b))
> - .unwrap();
> - let max = fixed_criterion.max_by(|a, b| a.total_cmp(b)).unwrap();
> + .unwrap_or(f64::NEG_INFINITY);
> + let max = fixed_criterion
> + .max_by(|a, b| a.total_cmp(b))
> + .unwrap_or(f64::INFINITY);
that's a very nice idea!
[snip]
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (5 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:29 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
` (32 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
Even though comparing by index is slightly faster here, comparing by the
nodename makes factoring this out for an upcoming patch possible.
This should increase runtime only marginally as this is roughly bound by
the 2 * node_count * maximum_hostname_length.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
proxmox-resource-scheduling/src/scheduler.rs | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index bb38f238..47abffb1 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -61,18 +61,17 @@ impl Scheduler {
let matrix = self
.nodes
.iter()
- .enumerate()
- .map(|(target_index, _)| {
+ .map(|node| {
// Base values on percentages to allow comparing nodes with different stats.
let mut highest_cpu = 0.0;
let mut squares_cpu = 0.0;
let mut highest_mem = 0.0;
let mut squares_mem = 0.0;
- for (index, node) in self.nodes.iter().enumerate() {
- let mut new_stats = node.stats;
+ for target_node in self.nodes.iter() {
+ let mut new_stats = target_node.stats;
- if index == target_index {
+ if node.name == target_node.name {
new_stats.add_started_resource(&resource_stats)
};
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (6 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:30 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
` (31 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The same calculation will be needed for the scoring of migrations with
the TOPSIS method in the following patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
proxmox-resource-scheduling/src/scheduler.rs | 68 ++++++++++++--------
1 file changed, 42 insertions(+), 26 deletions(-)
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 47abffb1..69dc6f4e 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -43,6 +43,44 @@ impl Scheduler {
}
}
+ /// Map the current node usages to a [`PveTopsisAlternative`].
+ ///
+ /// The [`PveTopsisAlternative`] is derived by calculating a modified version of the root mean
+ /// square (RMS) and maximum value of each stat in the node usages.
+ fn topsis_alternative_with(
+ &self,
+ map_node_stats: impl Fn(&NodeUsage) -> NodeStats,
+ ) -> PveTopsisAlternative {
+ let len = self.nodes.len();
+
+ // Base values on percentages to allow comparing nodes with different stats.
+ let mut highest_cpu = 0.0;
+ let mut squares_cpu = 0.0;
+ let mut highest_mem = 0.0;
+ let mut squares_mem = 0.0;
+
+ for node in self.nodes.iter() {
+ let new_stats = map_node_stats(node);
+
+ let new_cpu = new_stats.cpu_load();
+ highest_cpu = f64::max(highest_cpu, new_cpu);
+ squares_cpu += new_cpu.powi(2);
+
+ let new_mem = new_stats.mem_load();
+ highest_mem = f64::max(highest_mem, new_mem);
+ squares_mem += new_mem.powi(2);
+ }
+
+ // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
+ // 1.004 is only slightly more than 1.002.
+ PveTopsisAlternative {
+ average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
+ highest_cpu: 1.0 + highest_cpu,
+ average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
+ highest_memory: 1.0 + highest_mem,
+ }
+ }
+
/// Scores nodes to start a resource with the usage statistics `resource_stats` on.
///
/// The scoring is done as if the resource is already started on each node. This assumes that
@@ -55,43 +93,21 @@ impl Scheduler {
&self,
resource_stats: T,
) -> Result<Vec<(String, f64)>, Error> {
- let len = self.nodes.len();
let resource_stats = resource_stats.into();
let matrix = self
.nodes
.iter()
.map(|node| {
- // Base values on percentages to allow comparing nodes with different stats.
- let mut highest_cpu = 0.0;
- let mut squares_cpu = 0.0;
- let mut highest_mem = 0.0;
- let mut squares_mem = 0.0;
-
- for target_node in self.nodes.iter() {
+ self.topsis_alternative_with(|target_node| {
let mut new_stats = target_node.stats;
if node.name == target_node.name {
new_stats.add_started_resource(&resource_stats)
- };
+ }
- let new_cpu = new_stats.cpu_load();
- highest_cpu = f64::max(highest_cpu, new_cpu);
- squares_cpu += new_cpu.powi(2);
-
- let new_mem = new_stats.mem_load();
- highest_mem = f64::max(highest_mem, new_mem);
- squares_mem += new_mem.powi(2);
- }
-
- // Add 1.0 to avoid boosting tiny differences: e.g. 0.004 is twice as much as 0.002, but
- // 1.004 is only slightly more than 1.002.
- PveTopsisAlternative {
- average_cpu: 1.0 + (squares_cpu / len as f64).sqrt(),
- highest_cpu: 1.0 + highest_cpu,
- average_memory: 1.0 + (squares_mem / len as f64).sqrt(),
- highest_memory: 1.0 + highest_mem,
- }
+ new_stats
+ })
.into()
})
.collect::<Vec<_>>();
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (7 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-26 10:34 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
` (30 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
Assuming that a resource will hold the same dynamic resource usage on a
new node as on the previous node, score possible migrations, where:
- the cluster node imbalance is minimal (bruteforce), or
- the shifted root mean square and maximum resource usages of the cpu
and memory is minimal across the cluster nodes (TOPSIS).
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add saturating_sub() in remove_running_resource(...) (as suggested by
@Thomas)
- slightly move declarations and impls around so that reading from
top-to-bottom is a little easier
- pass NodeUsage vec instead of NodeStats vec to
calculate_node_imbalance(...)
- pass a closure to calculate_node_imbalance(...) (as suggested by
@Dominik)
- also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
struct is now ordered first by the imbalance and then the strings in
the `Migration` struct
- fix floating-point issue for the imbalance ordering for
ScoredMigration
- correctly implement `Ord` (essentially removing the reverse() and
moving these Reverse() wrappers to the usages for the BinaryHeap)
- use the `Migration` struct in `MigrationCandidate` as well
- drop Scheduler::node_stats() as it's unused now
- use Vec::with_capacity(...) where possible
- eagerly implement common traits (especially Clone and Debug)
- add test cases for the ScoredMigration ordering, node imbalance
calculation and the two rebalancing migration scoring methods
- s/score_best_balancing_migrations
/score_best_balancing_migration_candidates
to possibly allow the Scheduler/Usage impls handling the migration
candidate generation in the future instead of the callers
proxmox-resource-scheduling/src/node.rs | 17 ++
proxmox-resource-scheduling/src/scheduler.rs | 282 ++++++++++++++++++
.../tests/scheduler.rs | 169 ++++++++++-
3 files changed, 467 insertions(+), 1 deletion(-)
diff --git a/proxmox-resource-scheduling/src/node.rs b/proxmox-resource-scheduling/src/node.rs
index be462782..2dcef75e 100644
--- a/proxmox-resource-scheduling/src/node.rs
+++ b/proxmox-resource-scheduling/src/node.rs
@@ -29,6 +29,18 @@ impl NodeStats {
self.mem += resource_stats.maxmem;
}
+ /// Adds the resource stats to the node stats as if the resource is running on the node.
+ pub fn add_running_resource(&mut self, resource_stats: &ResourceStats) {
+ self.cpu += resource_stats.cpu;
+ self.mem += resource_stats.mem;
+ }
+
+ /// Removes the resource stats from the node stats as if the resource is not running on the node.
+ pub fn remove_running_resource(&mut self, resource_stats: &ResourceStats) {
+ self.cpu -= resource_stats.cpu;
+ self.mem = self.mem.saturating_sub(resource_stats.mem);
+ }
+
/// Returns the current cpu usage as a percentage.
pub fn cpu_load(&self) -> f64 {
self.cpu / self.maxcpu as f64
@@ -38,6 +50,11 @@ impl NodeStats {
pub fn mem_load(&self) -> f64 {
self.mem as f64 / self.maxmem as f64
}
+
+ /// Returns a combined node usage as a percentage.
+ pub fn load(&self) -> f64 {
+ (self.cpu_load() + self.mem_load()) / 2.0
+ }
}
/// A node in the cluster context.
diff --git a/proxmox-resource-scheduling/src/scheduler.rs b/proxmox-resource-scheduling/src/scheduler.rs
index 69dc6f4e..a25babad 100644
--- a/proxmox-resource-scheduling/src/scheduler.rs
+++ b/proxmox-resource-scheduling/src/scheduler.rs
@@ -2,6 +2,12 @@ use anyhow::Error;
use crate::{node::NodeStats, resource::ResourceStats, topsis};
+use serde::{Deserialize, Serialize};
+use std::{
+ cmp::{Ordering, Reverse},
+ collections::BinaryHeap,
+};
+
/// The scheduler view of a node.
#[derive(Clone, Debug)]
pub struct NodeUsage {
@@ -11,6 +17,36 @@ pub struct NodeUsage {
pub stats: NodeStats,
}
+/// Returns the load imbalance among the nodes.
+///
+/// The load balance is measured as the statistical dispersion of the individual node loads.
+///
+/// The current implementation uses the dimensionless coefficient of variation, which expresses the
+/// standard deviation in relation to the average mean of the node loads.
+///
+/// The coefficient of variation is not robust, which is a desired property here, because outliers
+/// should be detected as much as possible.
+fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
+ let node_count = nodes.len();
+ let node_loads = nodes.iter().map(to_load).collect::<Vec<_>>();
+
+ let load_sum = node_loads.iter().sum::<f64>();
+
+ // load_sum is guaranteed to be -0.0 for empty `nodes`
+ if load_sum == 0.0 {
+ 0.0
+ } else {
+ let load_mean = load_sum / node_count as f64;
+
+ let squared_diff_sum = node_loads
+ .iter()
+ .fold(0.0, |sum, node_load| sum + (node_load - load_mean).powi(2));
+ let load_sd = (squared_diff_sum / node_count as f64).sqrt();
+
+ load_sd / load_mean
+ }
+}
+
criteria_struct! {
/// A given alternative.
struct PveTopsisAlternative {
@@ -32,6 +68,83 @@ pub struct Scheduler {
nodes: Vec<NodeUsage>,
}
+/// A possible migration.
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct Migration {
+ /// The identifier of a leading resource.
+ pub sid: String,
+ /// The current node of the leading resource.
+ pub source_node: String,
+ /// The possible migration target node for the resource.
+ pub target_node: String,
+}
+
+/// A possible migration with a score.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+pub struct ScoredMigration {
+ /// The possible migration.
+ pub migration: Migration,
+ /// The expected node imbalance after the migration.
+ pub imbalance: f64,
+}
+
+impl Ord for ScoredMigration {
+ fn cmp(&self, other: &Self) -> Ordering {
+ self.imbalance
+ .total_cmp(&other.imbalance)
+ .then(self.migration.cmp(&other.migration))
+ }
+}
+
+impl PartialOrd for ScoredMigration {
+ fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+ Some(self.cmp(other))
+ }
+}
+
+impl PartialEq for ScoredMigration {
+ fn eq(&self, other: &Self) -> bool {
+ self.cmp(other) == Ordering::Equal
+ }
+}
+
+impl Eq for ScoredMigration {}
+
+impl ScoredMigration {
+ pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
+ // Depending how the imbalance is calculated, it can contain minor approximation errors. As
+ // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
+ // where the imbalance is the same up to the significant digits in base 10, but treated as
+ // different values.
+ //
+ // Therefore, truncate any non-significant digits to prevent these cases.
+ let factor = 10_f64.powf(f64::DIGITS as f64);
+ let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
+
+ Self {
+ migration: migration.into(),
+ imbalance: truncated_imbalance,
+ }
+ }
+}
+
+/// A possible migration candidate with the migrated usage stats.
+#[derive(Clone, Debug)]
+pub struct MigrationCandidate {
+ /// The possible migration.
+ pub migration: Migration,
+ /// The to-be-migrated resource usage stats.
+ pub stats: ResourceStats,
+}
+
+impl From<MigrationCandidate> for Migration {
+ fn from(candidate: MigrationCandidate) -> Self {
+ candidate.migration
+ }
+}
+
impl Scheduler {
/// Instantiate scheduler instance from node usages.
pub fn from_nodes<I>(nodes: I) -> Self
@@ -81,6 +194,123 @@ impl Scheduler {
}
}
+ /// Returns the load imbalance among the nodes.
+ ///
+ /// See [`calculate_node_imbalance`] for more information.
+ pub fn node_imbalance(&self) -> f64 {
+ calculate_node_imbalance(&self.nodes, |node| node.stats.load())
+ }
+
+ /// Returns the load imbalance among the nodes as if a specific resource was moved.
+ ///
+ /// See [`calculate_node_imbalance`] for more information.
+ fn node_imbalance_with_migration_candidate(&self, candidate: &MigrationCandidate) -> f64 {
+ calculate_node_imbalance(&self.nodes, |node| {
+ let mut new_stats = node.stats;
+
+ if node.name == candidate.migration.source_node {
+ new_stats.remove_running_resource(&candidate.stats);
+ } else if node.name == candidate.migration.target_node {
+ new_stats.add_running_resource(&candidate.stats);
+ }
+
+ new_stats.load()
+ })
+ }
+
+ /// Scores the given migration `candidates` by the best node imbalance improvement with
+ /// exhaustive search.
+ ///
+ /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+ /// done whether the given nodenames actually exist in the scheduler.
+ ///
+ /// The scoring is done as if each resource migration has already been done. This assumes that
+ /// the already migrated resource consumes the same amount of each stat as on the previous node
+ /// according to its `stats`.
+ ///
+ /// Returns up to `limit` of the best scored migrations.
+ pub fn score_best_balancing_migration_candidates<I>(
+ &self,
+ candidates: I,
+ limit: usize,
+ ) -> Vec<ScoredMigration>
+ where
+ I: IntoIterator<Item = MigrationCandidate>,
+ {
+ let mut scored_migrations = candidates
+ .into_iter()
+ .map(|candidate| {
+ let imbalance = self.node_imbalance_with_migration_candidate(&candidate);
+
+ Reverse(ScoredMigration::new(candidate, imbalance))
+ })
+ .collect::<BinaryHeap<_>>();
+
+ let mut best_migrations = Vec::with_capacity(limit);
+
+ // BinaryHeap::into_iter_sorted() is still in nightly unfortunately
+ while best_migrations.len() < limit {
+ match scored_migrations.pop() {
+ Some(Reverse(alternative)) => best_migrations.push(alternative),
+ None => break,
+ }
+ }
+
+ best_migrations
+ }
+
+ /// Scores the given migration `candidates` by the best node imbalance improvement with the
+ /// TOPSIS method.
+ ///
+ /// The `candidates` are assumed to be consistent with the scheduler. No further validation is
+ /// done whether the given nodenames actually exist in the scheduler.
+ ///
+ /// The scoring is done as if each resource migration has already been done. This assumes that
+ /// the already migrated resource consumes the same amount of each stat as on the previous node
+ /// according to its `stats`.
+ ///
+ /// Returns up to `limit` of the best scored migrations.
+ pub fn score_best_balancing_migration_candidates_topsis(
+ &self,
+ candidates: &[MigrationCandidate],
+ limit: usize,
+ ) -> Result<Vec<ScoredMigration>, Error> {
+ let matrix = candidates
+ .iter()
+ .map(|candidate| {
+ let resource_stats = &candidate.stats;
+ let source_node = &candidate.migration.source_node;
+ let target_node = &candidate.migration.target_node;
+
+ self.topsis_alternative_with(|node| {
+ let mut new_stats = node.stats;
+
+ if &node.name == source_node {
+ new_stats.remove_running_resource(resource_stats);
+ } else if &node.name == target_node {
+ new_stats.add_running_resource(resource_stats);
+ }
+
+ new_stats
+ })
+ .into()
+ })
+ .collect::<Vec<_>>();
+
+ let best_alternatives =
+ topsis::rank_alternatives(&topsis::Matrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+ Ok(best_alternatives
+ .into_iter()
+ .take(limit)
+ .map(|i| {
+ let imbalance = self.node_imbalance_with_migration_candidate(&candidates[i]);
+
+ ScoredMigration::new(candidates[i].clone(), imbalance)
+ })
+ .collect())
+ }
+
/// Scores nodes to start a resource with the usage statistics `resource_stats` on.
///
/// The scoring is done as if the resource is already started on each node. This assumes that
@@ -122,3 +352,55 @@ impl Scheduler {
.collect())
}
}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_scored_migration_order() {
+ let migration1 = ScoredMigration::new(
+ Migration {
+ sid: String::from("vm:102"),
+ source_node: String::from("node1"),
+ target_node: String::from("node2"),
+ },
+ 0.7231749488916931,
+ );
+ let migration2 = ScoredMigration::new(
+ Migration {
+ sid: String::from("vm:102"),
+ source_node: String::from("node1"),
+ target_node: String::from("node3"),
+ },
+ 0.723174948891693,
+ );
+ let migration3 = ScoredMigration::new(
+ Migration {
+ sid: String::from("vm:101"),
+ source_node: String::from("node1"),
+ target_node: String::from("node2"),
+ },
+ 0.723174948891693 + 1e-15,
+ );
+
+ let mut migrations = vec![migration2.clone(), migration3.clone(), migration1.clone()];
+
+ migrations.sort();
+
+ assert_eq!(
+ vec![migration1.clone(), migration2.clone(), migration3.clone()],
+ migrations
+ );
+
+ let mut heap = BinaryHeap::from(vec![
+ Reverse(migration2.clone()),
+ Reverse(migration3.clone()),
+ Reverse(migration1.clone()),
+ ]);
+
+ assert_eq!(heap.pop(), Some(Reverse(migration1)));
+ assert_eq!(heap.pop(), Some(Reverse(migration2)));
+ assert_eq!(heap.pop(), Some(Reverse(migration3)));
+ }
+}
diff --git a/proxmox-resource-scheduling/tests/scheduler.rs b/proxmox-resource-scheduling/tests/scheduler.rs
index c7a9dab9..8672f40d 100644
--- a/proxmox-resource-scheduling/tests/scheduler.rs
+++ b/proxmox-resource-scheduling/tests/scheduler.rs
@@ -2,9 +2,13 @@ use anyhow::Error;
use proxmox_resource_scheduling::{
node::NodeStats,
resource::ResourceStats,
- scheduler::{NodeUsage, Scheduler},
+ scheduler::{Migration, MigrationCandidate, NodeUsage, Scheduler, ScoredMigration},
};
+fn new_empty_cluster_scheduler() -> Scheduler {
+ Scheduler::from_nodes(Vec::<NodeUsage>::new())
+}
+
fn new_homogeneous_cluster_scheduler() -> Scheduler {
let (maxcpu, maxmem) = (16, 64 * (1 << 30));
@@ -75,6 +79,169 @@ fn new_heterogeneous_cluster_scheduler() -> Scheduler {
Scheduler::from_nodes(vec![node1, node2, node3])
}
+#[test]
+fn test_node_imbalance_with_empty_cluster() {
+ let scheduler = new_empty_cluster_scheduler();
+
+ assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+#[test]
+fn test_node_imbalance_with_perfectly_balanced_cluster() {
+ let node = NodeUsage {
+ name: String::from("node1"),
+ stats: NodeStats {
+ cpu: 1.7,
+ maxcpu: 16,
+ mem: 224395264,
+ maxmem: 68719476736,
+ },
+ };
+
+ let scheduler = Scheduler::from_nodes(vec![node.clone()]);
+
+ assert_eq!(scheduler.node_imbalance(), 0.0);
+
+ let scheduler = Scheduler::from_nodes(vec![node.clone(), node.clone(), node]);
+
+ assert_eq!(scheduler.node_imbalance(), 0.0);
+}
+
+fn new_simple_migration_candidates() -> (Vec<MigrationCandidate>, Migration, Migration) {
+ let migration1 = Migration {
+ sid: String::from("vm:101"),
+ source_node: String::from("node1"),
+ target_node: String::from("node2"),
+ };
+ let migration2 = Migration {
+ sid: String::from("vm:101"),
+ source_node: String::from("node1"),
+ target_node: String::from("node3"),
+ };
+ let stats = ResourceStats {
+ cpu: 0.7,
+ maxcpu: 4.0,
+ mem: 8 << 30,
+ maxmem: 16 << 30,
+ };
+
+ let candidates = vec![
+ MigrationCandidate {
+ migration: migration1.clone(),
+ stats,
+ },
+ MigrationCandidate {
+ migration: migration2.clone(),
+ stats,
+ },
+ ];
+
+ (candidates, migration1, migration2)
+}
+
+fn assert_imbalance(imbalance: f64, expected_imbalance: f64) {
+ assert!(
+ (expected_imbalance - imbalance).abs() <= f64::EPSILON,
+ "imbalance is {imbalance}, but was expected to be {expected_imbalance}"
+ );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_with_no_candidates() {
+ let scheduler = new_homogeneous_cluster_scheduler();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates(vec![], 2),
+ vec![]
+ );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_homogeneous_cluster() {
+ let scheduler = new_homogeneous_cluster_scheduler();
+
+ assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+ let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates(candidates, 2),
+ vec![
+ ScoredMigration::new(migration2.clone(), 0.5972874658664057),
+ ScoredMigration::new(migration1.clone(), 0.7239828690397611)
+ ]
+ );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_in_heterogeneous_cluster() {
+ let scheduler = new_heterogeneous_cluster_scheduler();
+
+ assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+ let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates(candidates, 2),
+ vec![
+ ScoredMigration::new(migration2, 0.525031850557711),
+ ScoredMigration::new(migration1, 0.5794177040605537)
+ ]
+ );
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_with_no_candidates() -> Result<(), Error> {
+ let scheduler = new_homogeneous_cluster_scheduler();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates_topsis(&vec![], 2)?,
+ vec![]
+ );
+
+ Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_homogeneous_cluster(
+) -> Result<(), Error> {
+ let scheduler = new_homogeneous_cluster_scheduler();
+
+ assert_imbalance(scheduler.node_imbalance(), 0.4893954724628247);
+
+ let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates_topsis(&candidates, 2)?,
+ vec![
+ ScoredMigration::new(migration1.clone(), 0.7239828690397611),
+ ScoredMigration::new(migration2.clone(), 0.5972874658664057),
+ ]
+ );
+
+ Ok(())
+}
+
+#[test]
+fn test_score_best_balancing_migration_candidates_topsis_in_heterogeneous_cluster(
+) -> Result<(), Error> {
+ let scheduler = new_heterogeneous_cluster_scheduler();
+
+ assert_imbalance(scheduler.node_imbalance(), 0.33026013056867354);
+
+ let (candidates, migration1, migration2) = new_simple_migration_candidates();
+
+ assert_eq!(
+ scheduler.score_best_balancing_migration_candidates_topsis(&candidates, 2)?,
+ vec![
+ ScoredMigration::new(migration1, 0.5794177040605537),
+ ScoredMigration::new(migration2, 0.525031850557711),
+ ]
+ );
+
+ Ok(())
+}
+
fn rank_nodes_to_start_resource(
scheduler: &Scheduler,
resource_stats: ResourceStats,
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-26 10:34 ` Dominik Rusovac
2026-03-26 14:11 ` Daniel Kral
0 siblings, 1 reply; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-26 10:34 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
pls find my comments inline, mostly relating to nits or tiny things
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> Assuming that a resource will hold the same dynamic resource usage on a
> new node as on the previous node, score possible migrations, where:
>
> - the cluster node imbalance is minimal (bruteforce), or
> - the shifted root mean square and maximum resource usages of the cpu
> and memory is minimal across the cluster nodes (TOPSIS).
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - add saturating_sub() in remove_running_resource(...) (as suggested by
> @Thomas)
> - slightly move declarations and impls around so that reading from
> top-to-bottom is a little easier
> - pass NodeUsage vec instead of NodeStats vec to
> calculate_node_imbalance(...)
> - pass a closure to calculate_node_imbalance(...) (as suggested by
> @Dominik)
> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
> struct is now ordered first by the imbalance and then the strings in
> the `Migration` struct
> - fix floating-point issue for the imbalance ordering for
> ScoredMigration
> - correctly implement `Ord` (essentially removing the reverse() and
> moving these Reverse() wrappers to the usages for the BinaryHeap)
> - use the `Migration` struct in `MigrationCandidate` as well
> - drop Scheduler::node_stats() as it's unused now
> - use Vec::with_capacity(...) where possible
> - eagerly implement common traits (especially Clone and Debug)
> - add test cases for the ScoredMigration ordering, node imbalance
> calculation and the two rebalancing migration scoring methods
> - s/score_best_balancing_migrations
> /score_best_balancing_migration_candidates
> to possibly allow the Scheduler/Usage impls handling the migration
> candidate generation in the future instead of the callers
[snip]
> +/// Returns the load imbalance among the nodes.
> +///
> +/// The load balance is measured as the statistical dispersion of the individual node loads.
> +///
> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
> +/// standard deviation in relation to the average mean of the node loads.
> +///
> +/// The coefficient of variation is not robust, which is a desired property here, because outliers
> +/// should be detected as much as possible.
> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
very nice docs!
[snip]
> +/// A possible migration.
> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
> +#[serde(rename_all = "kebab-case")]
> +pub struct Migration {
> + /// The identifier of a leading resource.
> + pub sid: String,
> + /// The current node of the leading resource.
> + pub source_node: String,
> + /// The possible migration target node for the resource.
> + pub target_node: String,
nit: on the long run, instead of having `ScoredMigration`,
it could be more convenient to have a field:
pub imbalance: Option<f64>,
> +}
[snip]
> +impl ScoredMigration {
> + pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
> + // Depending how the imbalance is calculated, it can contain minor approximation errors. As
// Depending [on] how [...]
> + // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
> + // where the imbalance is the same up to the significant digits in base 10, but treated as
> + // different values.
> + //
> + // Therefore, truncate any non-significant digits to prevent these cases.
> + let factor = 10_f64.powf(f64::DIGITS as f64);
> + let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
Nice solution, this appears to be a clean approach to achieve deterministic `Ord`
for `f64`.
One small thing, tho: `f64::DIGITS` is technically not a floating number, but `15_u32`.
let factor = 10_f64.powi(f64::DIGITS as i32);
thus, seems to be the better choice here. `powi` is also generally faster than `powf` [0].
[0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=Using%20this%20function%20is%20generally%20faster%20than%20using%20powf
[snip]
> +/// A possible migration candidate with the migrated usage stats.
> +#[derive(Clone, Debug)]
> +pub struct MigrationCandidate {
> + /// The possible migration.
> + pub migration: Migration,
> + /// The to-be-migrated resource usage stats.
imo, easier to comprehend:
/// Usage stats of the resource to be migrated
> + pub stats: ResourceStats,
> +}
[snip]
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
2026-03-26 10:34 ` Dominik Rusovac
@ 2026-03-26 14:11 ` Daniel Kral
2026-03-27 9:34 ` Dominik Rusovac
0 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 14:11 UTC (permalink / raw)
To: Dominik Rusovac, pve-devel
On Thu Mar 26, 2026 at 11:34 AM CET, Dominik Rusovac wrote:
> lgtm
>
> pls find my comments inline, mostly relating to nits or tiny things
>
> On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
>> Assuming that a resource will hold the same dynamic resource usage on a
>> new node as on the previous node, score possible migrations, where:
>>
>> - the cluster node imbalance is minimal (bruteforce), or
>> - the shifted root mean square and maximum resource usages of the cpu
>> and memory is minimal across the cluster nodes (TOPSIS).
>>
>> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
>> ---
>> changes v1 -> v2:
>> - add saturating_sub() in remove_running_resource(...) (as suggested by
>> @Thomas)
>> - slightly move declarations and impls around so that reading from
>> top-to-bottom is a little easier
>> - pass NodeUsage vec instead of NodeStats vec to
>> calculate_node_imbalance(...)
>> - pass a closure to calculate_node_imbalance(...) (as suggested by
>> @Dominik)
>> - also use `migration` for `Ord` impl of `ScoredMigration`, s.t. the
>> struct is now ordered first by the imbalance and then the strings in
>> the `Migration` struct
>> - fix floating-point issue for the imbalance ordering for
>> ScoredMigration
>> - correctly implement `Ord` (essentially removing the reverse() and
>> moving these Reverse() wrappers to the usages for the BinaryHeap)
>> - use the `Migration` struct in `MigrationCandidate` as well
>> - drop Scheduler::node_stats() as it's unused now
>> - use Vec::with_capacity(...) where possible
>> - eagerly implement common traits (especially Clone and Debug)
>> - add test cases for the ScoredMigration ordering, node imbalance
>> calculation and the two rebalancing migration scoring methods
>> - s/score_best_balancing_migrations
>> /score_best_balancing_migration_candidates
>> to possibly allow the Scheduler/Usage impls handling the migration
>> candidate generation in the future instead of the callers
>
> [snip]
>
>> +/// Returns the load imbalance among the nodes.
>> +///
>> +/// The load balance is measured as the statistical dispersion of the individual node loads.
>> +///
>> +/// The current implementation uses the dimensionless coefficient of variation, which expresses the
>> +/// standard deviation in relation to the average mean of the node loads.
>> +///
>> +/// The coefficient of variation is not robust, which is a desired property here, because outliers
>> +/// should be detected as much as possible.
>> +fn calculate_node_imbalance(nodes: &[NodeUsage], to_load: impl Fn(&NodeUsage) -> f64) -> f64 {
>
> very nice docs!
>
> [snip]
>
>> +/// A possible migration.
>> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
>> +#[serde(rename_all = "kebab-case")]
>> +pub struct Migration {
>> + /// The identifier of a leading resource.
>> + pub sid: String,
>> + /// The current node of the leading resource.
>> + pub source_node: String,
>> + /// The possible migration target node for the resource.
>> + pub target_node: String,
>
> nit: on the long run, instead of having `ScoredMigration`,
> it could be more convenient to have a field:
>
> pub imbalance: Option<f64>,
>
Might make sense, but then we can't reuse the same structure in
`MigrationCandidate` anymore, so I would let ScoredMigration be it's own
type, what do you think?
>> +}
>
> [snip]
>
>> +impl ScoredMigration {
>> + pub fn new<T: Into<Migration>>(migration: T, imbalance: f64) -> Self {
>> + // Depending how the imbalance is calculated, it can contain minor approximation errors. As
>
> // Depending [on] how [...]
Thanks, will change that!
>
>> + // this struct implements the Ord trait, users of the struct's cmp() can run into cases,
>> + // where the imbalance is the same up to the significant digits in base 10, but treated as
>> + // different values.
>> + //
>> + // Therefore, truncate any non-significant digits to prevent these cases.
>> + let factor = 10_f64.powf(f64::DIGITS as f64);
>> + let truncated_imbalance = f64::trunc(factor * imbalance) / factor;
>
> Nice solution, this appears to be a clean approach to achieve deterministic `Ord`
> for `f64`.
>
> One small thing, tho: `f64::DIGITS` is technically not a floating number, but `15_u32`.
>
> let factor = 10_f64.powi(f64::DIGITS as i32);
>
> thus, seems to be the better choice here. `powi` is also generally faster than `powf` [0].
>
> [0] https://doc.rust-lang.org/std/primitive.f64.html#method.powi:~:text=Using%20this%20function%20is%20generally%20faster%20than%20using%20powf
Thanks, good catch! I also weren't aware of them being non-deterministic
here, that's good to know.
We briefly also talked about this off-list, I'll adapt the test cases
below to not eq() the ScoredMigration directly as the truncation here
might be non-deterministic (and I assumed otherwise).
More-so, it's more important to verify the order in which the migrations
are scored than their exact imbalance score as you suggested.
>
> [snip]
>
>> +/// A possible migration candidate with the migrated usage stats.
>> +#[derive(Clone, Debug)]
>> +pub struct MigrationCandidate {
>> + /// The possible migration.
>> + pub migration: Migration,
>> + /// The to-be-migrated resource usage stats.
>
> imo, easier to comprehend:
>
> /// Usage stats of the resource to be migrated
Nice, will change it to that!
>
>> + pub stats: ResourceStats,
>> +}
>
> [snip]
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection
2026-03-26 14:11 ` Daniel Kral
@ 2026-03-27 9:34 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 9:34 UTC (permalink / raw)
To: Daniel Kral, pve-devel
On Thu Mar 26, 2026 at 3:11 PM CET, Daniel Kral wrote:
[snip]
>>> +/// A possible migration.
>>> +#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Serialize, Deserialize)]
>>> +#[serde(rename_all = "kebab-case")]
>>> +pub struct Migration {
>>> + /// The identifier of a leading resource.
>>> + pub sid: String,
>>> + /// The current node of the leading resource.
>>> + pub source_node: String,
>>> + /// The possible migration target node for the resource.
>>> + pub target_node: String,
>>
>> nit: on the long run, instead of having `ScoredMigration`,
>> it could be more convenient to have a field:
>>
>> pub imbalance: Option<f64>,
>>
>
> Might make sense, but then we can't reuse the same structure in
> `MigrationCandidate` anymore, so I would let ScoredMigration be it's own
> type, what do you think?
>
ok, then let's keep it as-is
[snip]
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (8 preceding siblings ...)
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 9:38 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
` (29 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
service_nodes entries from being cleaned up correctly.
While technically a API break, removing the error does not change any
callers, which do not handle the error anyway. Additionally,
remove_node(...) is only used in testing code in this package and
pve-ha-manager, but is currently unused for production code.
This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 5b91d36..6e57b9d 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -75,25 +75,16 @@ pub mod pve_rs_resource_scheduling_static {
/// Method: Remove a node from the scheduler.
#[export]
- pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> Result<(), Error> {
+ pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
let mut usage = this.inner.lock().unwrap();
if let Some(node) = usage.nodes.remove(nodename) {
for (sid, _) in node.services.iter() {
- match usage.service_nodes.get_mut(sid) {
- Some(service_nodes) => {
- service_nodes.remove(nodename);
- }
- None => bail!(
- "service '{}' not present in service_nodes hashmap while removing node '{}'",
- sid,
- nodename
- ),
+ if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
+ service_nodes.remove(nodename);
}
}
}
-
- Ok(())
}
/// Method: Get a list of all the nodes in the scheduler.
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
@ 2026-03-27 9:38 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 9:38 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The error can only happen due to an error in
> add_service_usage_to_node(...), but prevents all the following
> service_nodes entries from being cleaned up correctly.
>
> While technically a API break, removing the error does not change any
> callers, which do not handle the error anyway. Additionally,
> remove_node(...) is only used in testing code in this package and
> pve-ha-manager, but is currently unused for production code.
>
> This change makes the implementation more consistent with the new
> proxmox_resource_scheduling::usage::Usage, which will replace this in
> a following patch.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (9 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 9:39 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
` (28 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The error can only happen due to an error in
add_service_usage_to_node(...), but prevents all the following
node services entries from being cleaned up correctly.
While technically a API break, removing the error does not change any
callers.
This change makes the implementation more consistent with the new
proxmox_resource_scheduling::usage::Usage, which will replace this in
a following patch.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
pve-rs/src/bindings/resource_scheduling_static.rs | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling_static.rs
index 6e57b9d..b8eac57 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling_static.rs
@@ -145,25 +145,16 @@ pub mod pve_rs_resource_scheduling_static {
/// Method: Remove service `sid` and its usage from all assigned nodes.
#[export]
- fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) -> Result<(), Error> {
+ fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
let mut usage = this.inner.lock().unwrap();
if let Some(nodes) = usage.service_nodes.remove(sid) {
for nodename in &nodes {
- match usage.nodes.get_mut(nodename) {
- Some(node) => {
- node.services.remove(sid);
- }
- None => bail!(
- "service '{}' not present in usage hashmap on node '{}'",
- sid,
- nodename
- ),
+ if let Some(node) = usage.nodes.get_mut(nodename) {
+ node.services.remove(sid);
}
}
}
-
- Ok(())
}
/// Scores all previously added nodes for starting a `service` on.
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
@ 2026-03-27 9:39 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 9:39 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The error can only happen due to an error in
> add_service_usage_to_node(...), but prevents all the following
> node services entries from being cleaned up correctly.
>
> While technically a API break, removing the error does not change any
> callers.
>
> This change makes the implementation more consistent with the new
> proxmox_resource_scheduling::usage::Usage, which will replace this in
> a following patch.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (10 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 9:41 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
` (27 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
This is in preparation to add the upcoming pve_dynamic bindings, which
shares much of the same code paths as the pve_static implementation.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- move it in front of other changes done to pve_static so the code that
is shared with the upcoming pve_dynamic can already be put in separate
modules (as suggested by @Thomas)
- add more context and motivation to patch message
pve-rs/src/bindings/mod.rs | 3 +--
pve-rs/src/bindings/resource_scheduling/mod.rs | 4 ++++
.../pve_static.rs} | 2 +-
3 files changed, 6 insertions(+), 3 deletions(-)
create mode 100644 pve-rs/src/bindings/resource_scheduling/mod.rs
rename pve-rs/src/bindings/{resource_scheduling_static.rs => resource_scheduling/pve_static.rs} (98%)
diff --git a/pve-rs/src/bindings/mod.rs b/pve-rs/src/bindings/mod.rs
index c21b328..853a3dd 100644
--- a/pve-rs/src/bindings/mod.rs
+++ b/pve-rs/src/bindings/mod.rs
@@ -3,8 +3,7 @@
mod oci;
pub use oci::pve_rs_oci;
-mod resource_scheduling_static;
-pub use resource_scheduling_static::pve_rs_resource_scheduling_static;
+pub mod resource_scheduling;
mod tfa;
pub use tfa::pve_rs_tfa;
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
new file mode 100644
index 0000000..af1fb6b
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -0,0 +1,4 @@
+//! Resource scheduling related bindings.
+
+mod pve_static;
+pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
similarity index 98%
rename from pve-rs/src/bindings/resource_scheduling_static.rs
rename to pve-rs/src/bindings/resource_scheduling/pve_static.rs
index b8eac57..a83a9ab 100644
--- a/pve-rs/src/bindings/resource_scheduling_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -2,7 +2,7 @@
pub mod pve_rs_resource_scheduling_static {
//! The `PVE::RS::ResourceScheduling::Static` package.
//!
- //! Provides bindings for the resource scheduling module.
+ //! Provides bindings for the static resource scheduling module.
//!
//! See [`proxmox_resource_scheduling`].
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
@ 2026-03-27 9:41 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 9:41 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is in preparation to add the upcoming pve_dynamic bindings, which
> shares much of the same code paths as the pve_static implementation.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - move it in front of other changes done to pve_static so the code that
> is shared with the upcoming pve_dynamic can already be put in separate
> modules (as suggested by @Thomas)
> - add more context and motivation to patch message
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (11 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 14:13 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
` (26 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The proxmox_resource_scheduling crate provides a generic usage
implementation, which is backwards compatible with the pve_static
bindings. This reduces the static resource scheduling bindings to a
slightly thinner wrapper.
This also exposes the new `add_resource(...)` binding, which allows
callers to add services with additional state other than the usage
stats. It is exposed as `add_service(...)` to be consistent with the
naming of the rest of the existing methods.
Where it is sensible for the bindings, the documentation is extended
with a link to the documentation of the underlying methods.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add patch message for context
- change from only creating the
proxmox_resource_scheduling::scheduler::ClusterUsage (now,
proxmox_resource_scheduling::scheduler::Scheduler), to using the new
but backwards-compatible `Usage` implementation instead
- this essentially also squashes the 'store services stats independently
of node' patch in here as this is also tracked by the generic `Usage`
impl
- add `usage` and `resource` crate for shared code
.../src/bindings/resource_scheduling/mod.rs | 3 +
.../resource_scheduling/pve_static.rs | 152 ++++++------------
.../bindings/resource_scheduling/resource.rs | 44 +++++
.../src/bindings/resource_scheduling/usage.rs | 33 ++++
4 files changed, 132 insertions(+), 100 deletions(-)
create mode 100644 pve-rs/src/bindings/resource_scheduling/resource.rs
create mode 100644 pve-rs/src/bindings/resource_scheduling/usage.rs
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index af1fb6b..9ce631c 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -1,4 +1,7 @@
//! Resource scheduling related bindings.
+mod resource;
+mod usage;
+
mod pve_static;
pub use pve_static::pve_rs_resource_scheduling_static;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index a83a9ab..3d9f142 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -6,40 +6,34 @@ pub mod pve_rs_resource_scheduling_static {
//!
//! See [`proxmox_resource_scheduling`].
- use std::collections::{HashMap, HashSet};
use std::sync::Mutex;
- use anyhow::{Error, bail};
+ use anyhow::Error;
use perlmod::Value;
- use proxmox_resource_scheduling::pve_static::{StaticNodeUsage, StaticServiceUsage};
+ use proxmox_resource_scheduling::node::NodeStats;
+ use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+ use proxmox_resource_scheduling::usage::Usage;
+
+ use crate::bindings::resource_scheduling::{
+ resource::PveResource, usage::StartedResourceAggregator,
+ };
perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
- struct StaticNodeInfo {
- name: String,
- maxcpu: usize,
- maxmem: usize,
- services: HashMap<String, StaticServiceUsage>,
- }
-
- struct Usage {
- nodes: HashMap<String, StaticNodeInfo>,
- service_nodes: HashMap<String, HashSet<String>>,
- }
-
- /// A scheduler instance contains the resource usage by node.
+ /// A scheduler instance contains the cluster usage.
pub struct Scheduler {
inner: Mutex<Usage>,
}
+ type StaticResource = PveResource<StaticServiceUsage>;
+
/// Class method: Create a new [`Scheduler`] instance.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::new`].
#[export(raw_return)]
pub fn new(#[raw] class: Value) -> Result<Value, Error> {
- let inner = Usage {
- nodes: HashMap::new(),
- service_nodes: HashMap::new(),
- };
+ let inner = Usage::new();
Ok(perlmod::instantiate_magic!(
&class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
@@ -48,7 +42,7 @@ pub mod pve_rs_resource_scheduling_static {
/// Method: Add a node with its basic CPU and memory info.
///
- /// This inserts a [`StaticNodeInfo`] entry for the node into the scheduler instance.
+ /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
#[export]
pub fn add_node(
#[try_from_ref] this: &Scheduler,
@@ -58,33 +52,24 @@ pub mod pve_rs_resource_scheduling_static {
) -> Result<(), Error> {
let mut usage = this.inner.lock().unwrap();
- if usage.nodes.contains_key(&nodename) {
- bail!("node {} already added", nodename);
- }
-
- let node = StaticNodeInfo {
- name: nodename.clone(),
+ let stats = NodeStats {
+ cpu: 0.0,
maxcpu,
+ mem: 0,
maxmem,
- services: HashMap::new(),
};
- usage.nodes.insert(nodename, node);
- Ok(())
+ usage.add_node(nodename, stats)
}
/// Method: Remove a node from the scheduler.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
#[export]
pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
let mut usage = this.inner.lock().unwrap();
- if let Some(node) = usage.nodes.remove(nodename) {
- for (sid, _) in node.services.iter() {
- if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
- service_nodes.remove(nodename);
- }
- }
- }
+ usage.remove_node(nodename);
}
/// Method: Get a list of all the nodes in the scheduler.
@@ -93,8 +78,7 @@ pub mod pve_rs_resource_scheduling_static {
let usage = this.inner.lock().unwrap();
usage
- .nodes
- .keys()
+ .nodenames_iter()
.map(|nodename| nodename.to_string())
.collect()
}
@@ -104,10 +88,26 @@ pub mod pve_rs_resource_scheduling_static {
pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
let usage = this.inner.lock().unwrap();
- usage.nodes.contains_key(nodename)
+ usage.contains_node(nodename)
+ }
+
+ /// Method: Add `service` with identifier `sid` to the scheduler.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+ #[export]
+ pub fn add_service(
+ #[try_from_ref] this: &Scheduler,
+ sid: String,
+ service: StaticResource,
+ ) -> Result<(), Error> {
+ let mut usage = this.inner.lock().unwrap();
+
+ usage.add_resource(sid, service.try_into()?)
}
/// Method: Add service `sid` and its `service_usage` to the node.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::add_resource_usage_to_node`].
#[export]
pub fn add_service_usage_to_node(
#[try_from_ref] this: &Scheduler,
@@ -117,81 +117,33 @@ pub mod pve_rs_resource_scheduling_static {
) -> Result<(), Error> {
let mut usage = this.inner.lock().unwrap();
- match usage.nodes.get_mut(nodename) {
- Some(node) => {
- if node.services.contains_key(sid) {
- bail!("service '{}' already added to node '{}'", sid, nodename);
- }
-
- node.services.insert(sid.to_string(), service_usage);
- }
- None => bail!("node '{}' not present in usage hashmap", nodename),
- }
-
- if let Some(service_nodes) = usage.service_nodes.get_mut(sid) {
- if service_nodes.contains(nodename) {
- bail!("node '{}' already added to service '{}'", nodename, sid);
- }
-
- service_nodes.insert(nodename.to_string());
- } else {
- let mut service_nodes = HashSet::new();
- service_nodes.insert(nodename.to_string());
- usage.service_nodes.insert(sid.to_string(), service_nodes);
- }
-
- Ok(())
+ // TODO Only for backwards compatibility, can be removed with a proper version bump
+ #[allow(deprecated)]
+ usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
}
/// Method: Remove service `sid` and its usage from all assigned nodes.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
#[export]
fn remove_service_usage(#[try_from_ref] this: &Scheduler, sid: &str) {
let mut usage = this.inner.lock().unwrap();
- if let Some(nodes) = usage.service_nodes.remove(sid) {
- for nodename in &nodes {
- if let Some(node) = usage.nodes.get_mut(nodename) {
- node.services.remove(sid);
- }
- }
- }
+ usage.remove_resource(sid);
}
- /// Scores all previously added nodes for starting a `service` on.
+ /// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
///
- /// Scoring is done according to the static memory and CPU usages of the nodes as if the
- /// service would already be running on each.
- ///
- /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
- /// score is better.
- ///
- /// See [`proxmox_resource_scheduling::pve_static::score_nodes_to_start_service`].
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
#[export]
pub fn score_nodes_to_start_service(
#[try_from_ref] this: &Scheduler,
- service: StaticServiceUsage,
+ service_stats: StaticServiceUsage,
) -> Result<Vec<(String, f64)>, Error> {
let usage = this.inner.lock().unwrap();
- let nodes = usage
- .nodes
- .values()
- .map(|node| {
- let mut node_usage = StaticNodeUsage {
- name: node.name.to_string(),
- cpu: 0.0,
- maxcpu: node.maxcpu,
- mem: 0,
- maxmem: node.maxmem,
- };
- for service in node.services.values() {
- node_usage.add_service_usage(service);
- }
-
- node_usage
- })
- .collect::<Vec<StaticNodeUsage>>();
-
- proxmox_resource_scheduling::pve_static::score_nodes_to_start_service(&nodes, &service)
+ usage
+ .to_scheduler::<StartedResourceAggregator>()
+ .score_nodes_to_start_resource(service_stats)
}
}
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
new file mode 100644
index 0000000..91d56b9
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -0,0 +1,44 @@
+use anyhow::{Error, bail};
+use proxmox_resource_scheduling::resource::{
+ Resource, ResourcePlacement, ResourceState, ResourceStats,
+};
+
+use serde::{Deserialize, Serialize};
+
+/// A PVE resource.
+#[derive(Serialize, Deserialize)]
+pub struct PveResource<T: Into<ResourceStats>> {
+ /// The resource's usage statistics.
+ stats: T,
+ /// Whether the resource is running.
+ running: bool,
+ /// The resource's current node.
+ current_node: Option<String>,
+ /// The resource's optional migration target node.
+ target_node: Option<String>,
+}
+
+impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
+ type Error = Error;
+
+ fn try_from(resource: PveResource<T>) -> Result<Self, Error> {
+ let state = if resource.running {
+ ResourceState::Started
+ } else {
+ ResourceState::Starting
+ };
+
+ let placement = match (resource.current_node, resource.target_node) {
+ (Some(current_node), Some(target_node)) => ResourcePlacement::Moving {
+ current_node,
+ target_node,
+ },
+ (Some(current_node), None) | (None, Some(current_node)) => {
+ ResourcePlacement::Stationary { current_node }
+ }
+ _ => bail!("neither current_node nor target_node are set"),
+ };
+
+ Ok(Resource::new(resource.stats.into(), state, placement))
+ }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
new file mode 100644
index 0000000..fc8b872
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -0,0 +1,33 @@
+use proxmox_resource_scheduling::{
+ scheduler::NodeUsage,
+ usage::{Usage, UsageAggregator},
+};
+
+/// An aggregator, which adds any resource as a started resource.
+///
+/// This aggregator is useful if the node base stats do not have any current usage.
+pub(crate) struct StartedResourceAggregator;
+
+impl UsageAggregator for StartedResourceAggregator {
+ fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+ usage
+ .nodes_iter()
+ .map(|(nodename, node)| {
+ let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
+ let mut node_stats = node_stats;
+
+ if let Some(resource) = usage.get_resource(sid) {
+ node_stats.add_started_resource(&resource.stats());
+ }
+
+ node_stats
+ });
+
+ NodeUsage {
+ name: nodename.to_string(),
+ stats,
+ }
+ })
+ .collect()
+ }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-27 14:13 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:13 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider modulo nits
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The proxmox_resource_scheduling crate provides a generic usage
> implementation, which is backwards compatible with the pve_static
> bindings. This reduces the static resource scheduling bindings to a
> slightly thinner wrapper.
good measure, to make proxmox-resource-scheduling handle usage
>
> This also exposes the new `add_resource(...)` binding, which allows
> callers to add services with additional state other than the usage
> stats. It is exposed as `add_service(...)` to be consistent with the
> naming of the rest of the existing methods.
>
> Where it is sensible for the bindings, the documentation is extended
> with a link to the documentation of the underlying methods.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - add patch message for context
> - change from only creating the
> proxmox_resource_scheduling::scheduler::ClusterUsage (now,
> proxmox_resource_scheduling::scheduler::Scheduler), to using the new
> but backwards-compatible `Usage` implementation instead
> - this essentially also squashes the 'store services stats independently
> of node' patch in here as this is also tracked by the generic `Usage`
> impl
> - add `usage` and `resource` crate for shared code
[snip]
> +
> +impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
> + type Error = Error;
> +
> + fn try_from(resource: PveResource<T>) -> Result<Self, Error> {
> + let state = if resource.running {
> + ResourceState::Started
> + } else {
> + ResourceState::Starting
> + };
> +
> + let placement = match (resource.current_node, resource.target_node) {
as it came up off-list, we might not only prohibit equivalence of
current_node and target_node in proxmox-resource-scheduling, but
also here
> + (Some(current_node), Some(target_node)) => ResourcePlacement::Moving {
> + current_node,
> + target_node,
> + },
> + (Some(current_node), None) | (None, Some(current_node)) => {
it would be good to have a comment (// NOTE: ...) explaining as to why
this arm's code
> + ResourcePlacement::Stationary { current_node }
> + }
> + _ => bail!("neither current_node nor target_node are set"),
> + };
> +
> + Ok(Resource::new(resource.stats.into(), state, placement))
> + }
> +}
[snip]
> +impl UsageAggregator for StartedResourceAggregator {
> + fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
> + usage
> + .nodes_iter()
> + .map(|(nodename, node)| {
nice fold!
nit: by making `node_stats` mutable in the first place, variable
shadowing can be avoided, see:
let stats = node.resources_iter().fold(node.stats(), |mut node_stats, sid| {
if let Some(resource) = usage.get_resource(sid) {
node_stats.add_started_resource(&resource.stats());
}
node_stats
});
> + let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
> + let mut node_stats = node_stats;
> +
> + if let Some(resource) = usage.get_resource(sid) {
> + node_stats.add_started_resource(&resource.stats());
> + }
> +
> + node_stats
> + });
> +
> + NodeUsage {
> + name: nodename.to_string(),
> + stats,
> + }
> + })
> + .collect()
> + }
> +}
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (12 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 14:18 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
` (25 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The StaticServiceUsage is marked as deprecated in
proxmox-resource-scheduling now to make the crate independent of the
specific usage structs and the deserialization of these.
Therefore, define the same struct in the pve_static bindings module.
Though this is technically a Rust API break, the Perl bindings do not
have the concept of structs, which are serialized as Perl hashes.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
Move towards handling (de)serialization in pve-rs and having the generic
impls in the proxmox-resource-scheduling crate.
.../resource_scheduling/pve_static.rs | 32 ++++++++++++++++---
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index 3d9f142..e2756db 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -9,10 +9,11 @@ pub mod pve_rs_resource_scheduling_static {
use std::sync::Mutex;
use anyhow::Error;
+ use serde::{Deserialize, Serialize};
use perlmod::Value;
use proxmox_resource_scheduling::node::NodeStats;
- use proxmox_resource_scheduling::pve_static::StaticServiceUsage;
+ use proxmox_resource_scheduling::resource::ResourceStats;
use proxmox_resource_scheduling::usage::Usage;
use crate::bindings::resource_scheduling::{
@@ -26,7 +27,28 @@ pub mod pve_rs_resource_scheduling_static {
inner: Mutex<Usage>,
}
- type StaticResource = PveResource<StaticServiceUsage>;
+ #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+ #[serde(rename_all = "kebab-case")]
+ /// Static usage stats of a resource.
+ pub struct StaticResourceStats {
+ /// Number of assigned CPUs or CPU limit.
+ pub maxcpu: f64,
+ /// Maximum assigned memory in bytes.
+ pub maxmem: usize,
+ }
+
+ impl From<StaticResourceStats> for ResourceStats {
+ fn from(stats: StaticResourceStats) -> Self {
+ Self {
+ cpu: stats.maxcpu,
+ maxcpu: stats.maxcpu,
+ mem: stats.maxmem,
+ maxmem: stats.maxmem,
+ }
+ }
+ }
+
+ type StaticResource = PveResource<StaticResourceStats>;
/// Class method: Create a new [`Scheduler`] instance.
///
@@ -113,13 +135,13 @@ pub mod pve_rs_resource_scheduling_static {
#[try_from_ref] this: &Scheduler,
nodename: &str,
sid: &str,
- service_usage: StaticServiceUsage,
+ service_stats: StaticResourceStats,
) -> Result<(), Error> {
let mut usage = this.inner.lock().unwrap();
// TODO Only for backwards compatibility, can be removed with a proper version bump
#[allow(deprecated)]
- usage.add_resource_usage_to_node(nodename, sid, service_usage.into())
+ usage.add_resource_usage_to_node(nodename, sid, service_stats.into())
}
/// Method: Remove service `sid` and its usage from all assigned nodes.
@@ -138,7 +160,7 @@ pub mod pve_rs_resource_scheduling_static {
#[export]
pub fn score_nodes_to_start_service(
#[try_from_ref] this: &Scheduler,
- service_stats: StaticServiceUsage,
+ service_stats: StaticResourceStats,
) -> Result<Vec<(String, f64)>, Error> {
let usage = this.inner.lock().unwrap();
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
@ 2026-03-27 14:18 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:18 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The StaticServiceUsage is marked as deprecated in
> proxmox-resource-scheduling now to make the crate independent of the
> specific usage structs and the deserialization of these.
>
> Therefore, define the same struct in the pve_static bindings module.
>
> Though this is technically a Rust API break, the Perl bindings do not
> have the concept of structs, which are serialized as Perl hashes.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (13 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
@ 2026-03-24 18:29 ` Daniel Kral
2026-03-27 14:15 ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
` (24 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:29 UTC (permalink / raw)
To: pve-devel
The implementation is similar to pve_static, but extends the node and
resource stats with sampled runtime usage statistics, i.e., the actual
usage on the nodes and the actual usages of the resources.
In the case of users repeatedly calling score_nodes_to_start_resource()
and then adding them as starting resources with add_resource(), these
starting resources need to be accumulated on top of these nodes actual
current usages to prevent score_nodes_to_start_resource() to favor the
currently least loaded node(s) for all starting resources.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- move this patch one before 'expose auto rebalancing methods' as this
is the same change order as done in pve-ha-manager, making it easier
to separate the feature of using dynamic usage information and
afterwards allowing rebalancing methods with static and dynamic usage
information
- adapt patch message accordingly
- s/service/resource/ for any new struct and method as this is more
consistent with the naming in the HA Manager and the name of the
crate/module itself; can change this back if it's better in the other
way, but as these are new API endpoints, I thought it's better to do
it now than later
pve-rs/Makefile | 1 +
.../src/bindings/resource_scheduling/mod.rs | 3 +
.../resource_scheduling/pve_dynamic.rs | 174 ++++++++++++++++++
.../src/bindings/resource_scheduling/usage.rs | 33 ++++
pve-rs/test/resource_scheduling.pl | 1 +
5 files changed, 212 insertions(+)
create mode 100644 pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
diff --git a/pve-rs/Makefile b/pve-rs/Makefile
index 9faa735..f0212b7 100644
--- a/pve-rs/Makefile
+++ b/pve-rs/Makefile
@@ -30,6 +30,7 @@ PERLMOD_PACKAGES := \
PVE::RS::OCI \
PVE::RS::OpenId \
PVE::RS::ResourceScheduling::Static \
+ PVE::RS::ResourceScheduling::Dynamic \
PVE::RS::SDN::Fabrics \
PVE::RS::TFA
diff --git a/pve-rs/src/bindings/resource_scheduling/mod.rs b/pve-rs/src/bindings/resource_scheduling/mod.rs
index 9ce631c..87b4a03 100644
--- a/pve-rs/src/bindings/resource_scheduling/mod.rs
+++ b/pve-rs/src/bindings/resource_scheduling/mod.rs
@@ -5,3 +5,6 @@ mod usage;
mod pve_static;
pub use pve_static::pve_rs_resource_scheduling_static;
+
+mod pve_dynamic;
+pub use pve_dynamic::pve_rs_resource_scheduling_dynamic;
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
new file mode 100644
index 0000000..5b4373e
--- /dev/null
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -0,0 +1,174 @@
+#[perlmod::package(name = "PVE::RS::ResourceScheduling::Dynamic", lib = "pve_rs")]
+pub mod pve_rs_resource_scheduling_dynamic {
+ //! The `PVE::RS::ResourceScheduling::Dynamic` package.
+ //!
+ //! Provides bindings for the dynamic resource scheduling module.
+ //!
+ //! See [`proxmox_resource_scheduling`].
+
+ use std::sync::Mutex;
+
+ use anyhow::Error;
+ use serde::{Deserialize, Serialize};
+
+ use perlmod::Value;
+ use proxmox_resource_scheduling::node::NodeStats;
+ use proxmox_resource_scheduling::resource::ResourceStats;
+ use proxmox_resource_scheduling::usage::Usage;
+
+ use crate::bindings::resource_scheduling::resource::PveResource;
+ use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+
+ perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
+
+ /// A scheduler instance contains the cluster usage.
+ pub struct Scheduler {
+ inner: Mutex<Usage>,
+ }
+
+ #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+ #[serde(rename_all = "kebab-case")]
+ /// Dynamic usage stats of a node.
+ pub struct DynamicNodeStats {
+ /// CPU utilization in CPU cores.
+ pub cpu: f64,
+ /// Total number of CPU cores.
+ pub maxcpu: usize,
+ /// Used memory in bytes.
+ pub mem: usize,
+ /// Total memory in bytes.
+ pub maxmem: usize,
+ }
+
+ impl From<DynamicNodeStats> for NodeStats {
+ fn from(value: DynamicNodeStats) -> Self {
+ Self {
+ cpu: value.cpu,
+ maxcpu: value.maxcpu,
+ mem: value.mem,
+ maxmem: value.maxmem,
+ }
+ }
+ }
+
+ #[derive(Clone, Copy, Debug, Serialize, Deserialize)]
+ #[serde(rename_all = "kebab-case")]
+ /// Dynamic usage stats of a resource.
+ pub struct DynamicResourceStats {
+ /// CPU utilization in CPU cores.
+ pub cpu: f64,
+ /// Number of assigned CPUs or CPU limit.
+ pub maxcpu: f64,
+ /// Used memory in bytes.
+ pub mem: usize,
+ /// Maximum assigned memory in bytes.
+ pub maxmem: usize,
+ }
+
+ impl From<DynamicResourceStats> for ResourceStats {
+ fn from(value: DynamicResourceStats) -> Self {
+ Self {
+ cpu: value.cpu,
+ maxcpu: value.maxcpu,
+ mem: value.mem,
+ maxmem: value.maxmem,
+ }
+ }
+ }
+
+ type DynamicResource = PveResource<DynamicResourceStats>;
+
+ /// Class method: Create a new [`Scheduler`] instance.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::new`].
+ #[export(raw_return)]
+ pub fn new(#[raw] class: Value) -> Result<Value, Error> {
+ let inner = Usage::new();
+
+ Ok(perlmod::instantiate_magic!(
+ &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
+ ))
+ }
+
+ /// Method: Add a node with its basic CPU and memory info.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::add_node`].
+ #[export]
+ pub fn add_node(
+ #[try_from_ref] this: &Scheduler,
+ nodename: String,
+ stats: DynamicNodeStats,
+ ) -> Result<(), Error> {
+ let mut usage = this.inner.lock().unwrap();
+
+ usage.add_node(nodename, stats.into())
+ }
+
+ /// Method: Remove a node from the scheduler.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::remove_node`].
+ #[export]
+ pub fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
+ let mut usage = this.inner.lock().unwrap();
+
+ usage.remove_node(nodename);
+ }
+
+ /// Method: Get a list of all the nodes in the scheduler.
+ #[export]
+ pub fn list_nodes(#[try_from_ref] this: &Scheduler) -> Vec<String> {
+ let usage = this.inner.lock().unwrap();
+
+ usage
+ .nodenames_iter()
+ .map(|nodename| nodename.to_string())
+ .collect()
+ }
+
+ /// Method: Check whether a node exists in the scheduler.
+ #[export]
+ pub fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
+ let usage = this.inner.lock().unwrap();
+
+ usage.contains_node(nodename)
+ }
+
+ /// Method: Add `resource` with identifier `sid` to the scheduler.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::add_resource`].
+ #[export]
+ pub fn add_resource(
+ #[try_from_ref] this: &Scheduler,
+ sid: String,
+ resource: DynamicResource,
+ ) -> Result<(), Error> {
+ let mut usage = this.inner.lock().unwrap();
+
+ usage.add_resource(sid, resource.try_into()?)
+ }
+
+ /// Method: Remove resource `sid` and its usage from all assigned nodes.
+ ///
+ /// See [`proxmox_resource_scheduling::usage::Usage::remove_resource`].
+ #[export]
+ fn remove_resource(#[try_from_ref] this: &Scheduler, sid: &str) {
+ let mut usage = this.inner.lock().unwrap();
+
+ usage.remove_resource(sid);
+ }
+
+ /// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
+ #[export]
+ pub fn score_nodes_to_start_resource(
+ #[try_from_ref] this: &Scheduler,
+ resource_stats: DynamicResourceStats,
+ ) -> Result<Vec<(String, f64)>, Error> {
+ let usage = this.inner.lock().unwrap();
+
+ usage
+ .to_scheduler::<StartingAsStartedResourceAggregator>()
+ .score_nodes_to_start_resource(resource_stats)
+ }
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index fc8b872..87b7e3e 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -1,4 +1,5 @@
use proxmox_resource_scheduling::{
+ resource::ResourceState,
scheduler::NodeUsage,
usage::{Usage, UsageAggregator},
};
@@ -31,3 +32,35 @@ impl UsageAggregator for StartedResourceAggregator {
.collect()
}
}
+
+/// An aggregator, which uses the node base stats and adds any starting resources as already
+/// started resources to the node stats.
+///
+/// This aggregator is useful if starting resources should be considered in the scheduler.
+pub(crate) struct StartingAsStartedResourceAggregator;
+
+impl UsageAggregator for StartingAsStartedResourceAggregator {
+ fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+ usage
+ .nodes_iter()
+ .map(|(nodename, node)| {
+ let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
+ let mut node_stats = node_stats;
+
+ if let Some(resource) = usage.get_resource(sid)
+ && resource.state() == ResourceState::Starting
+ {
+ node_stats.add_started_resource(&resource.stats());
+ }
+
+ node_stats
+ });
+
+ NodeUsage {
+ name: nodename.to_string(),
+ stats,
+ }
+ })
+ .collect()
+ }
+}
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
index a332269..3775242 100755
--- a/pve-rs/test/resource_scheduling.pl
+++ b/pve-rs/test/resource_scheduling.pl
@@ -6,6 +6,7 @@ use warnings;
use Test::More;
use PVE::RS::ResourceScheduling::Static;
+use PVE::RS::ResourceScheduling::Dynamic;
my sub score_nodes {
my ($static, $service) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
@ 2026-03-27 14:15 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:15 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm, consider modulo one nit
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> The implementation is similar to pve_static, but extends the node and
> resource stats with sampled runtime usage statistics, i.e., the actual
> usage on the nodes and the actual usages of the resources.
>
> In the case of users repeatedly calling score_nodes_to_start_resource()
> and then adding them as starting resources with add_resource(), these
> starting resources need to be accumulated on top of these nodes actual
> current usages to prevent score_nodes_to_start_resource() to favor the
> currently least loaded node(s) for all starting resources.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - move this patch one before 'expose auto rebalancing methods' as this
> is the same change order as done in pve-ha-manager, making it easier
> to separate the feature of using dynamic usage information and
> afterwards allowing rebalancing methods with static and dynamic usage
> information
> - adapt patch message accordingly
> - s/service/resource/ for any new struct and method as this is more
> consistent with the naming in the HA Manager and the name of the
> crate/module itself; can change this back if it's better in the other
> way, but as these are new API endpoints, I thought it's better to do
> it now than later
>
[snip]
> +impl UsageAggregator for StartingAsStartedResourceAggregator {
> + fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
> + usage
> + .nodes_iter()
> + .map(|(nodename, node)| {
nice fold!
nit: by making `node_stats` mutable in the first place, variable
shadowing can be avoided, see:
let stats = node.resources_iter().fold(node.stats(), |mut node_stats, sid| {
if let Some(resource) = usage.get_resource(sid) {
node_stats.add_started_resource(&resource.stats());
}
node_stats
});
> + let stats = node.resources_iter().fold(node.stats(), |node_stats, sid| {
> + let mut node_stats = node_stats;
> +
> + if let Some(resource) = usage.get_resource(sid)
> + && resource.state() == ResourceState::Starting
> + {
> + node_stats.add_started_resource(&resource.stats());
> + }
> +
> + node_stats
> + });
> +
> + NodeUsage {
> + name: nodename.to_string(),
> + stats,
> + }
> + })
> + .collect()
> + }
> +}
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (14 preceding siblings ...)
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-27 14:16 ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
` (23 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These methods expose the auto rebalancing methods of both the static and
dynamic scheduler.
As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
takes a possible very large list of migration candidates, the binding
takes a more compact representation, which reduces the size that needs
to be generated on the caller's side and therefore the runtime of the
serialization from Perl to Rust.
Additionally, while decomposing the compact representation the input
data is validated since the underlying scoring methods do not further
validate whether their input is consistent with the cluster usage.
The method names score_best_balancing_migration_candidates{,_topsis}()
are chosen deliberately, so that future extensions can implement
score_best_balancing_migrations{,_topsis}(), which might allow to score
migrations without providing the candidates.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- improve patch message and documentation
- move to the end of the perl-rs changes, which makes it more consistent
with the change order in pve-ha-manager as well
- uses `UsageAggregator` now to discern how usages are accumulated
- s/generate_migration_candidates_from
/decompose_compact_migration_candidates
- make the decomposition of compact migration candidates more robust and
do not use any unwraps or other causes of panic but the Mutex guard
unwrap
.../resource_scheduling/pve_dynamic.rs | 57 +++++++++++-
.../resource_scheduling/pve_static.rs | 56 +++++++++++-
.../bindings/resource_scheduling/resource.rs | 88 ++++++++++++++++++-
.../src/bindings/resource_scheduling/usage.rs | 15 ++++
4 files changed, 211 insertions(+), 5 deletions(-)
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
index 5b4373e..26f36d1 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_dynamic.rs
@@ -14,10 +14,15 @@ pub mod pve_rs_resource_scheduling_dynamic {
use perlmod::Value;
use proxmox_resource_scheduling::node::NodeStats;
use proxmox_resource_scheduling::resource::ResourceStats;
+ use proxmox_resource_scheduling::scheduler::ScoredMigration;
use proxmox_resource_scheduling::usage::Usage;
- use crate::bindings::resource_scheduling::resource::PveResource;
- use crate::bindings::resource_scheduling::usage::StartingAsStartedResourceAggregator;
+ use crate::bindings::resource_scheduling::resource::{
+ CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+ };
+ use crate::bindings::resource_scheduling::usage::{
+ IdentityAggregator, StartingAsStartedResourceAggregator,
+ };
perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Dynamic");
@@ -157,6 +162,54 @@ pub mod pve_rs_resource_scheduling_dynamic {
usage.remove_resource(sid);
}
+ /// Method: Returns the load imbalance among the nodes.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+ #[export]
+ pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+ let usage = this.inner.lock().unwrap();
+
+ usage.to_scheduler::<IdentityAggregator>().node_imbalance()
+ }
+
+ /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+ /// exhaustive search.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+ #[export]
+ pub fn score_best_balancing_migration_candidates(
+ #[try_from_ref] this: &Scheduler,
+ candidates: Vec<CompactMigrationCandidate>,
+ limit: usize,
+ ) -> Result<Vec<ScoredMigration>, Error> {
+ let usage = this.inner.lock().unwrap();
+
+ let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+ Ok(usage
+ .to_scheduler::<IdentityAggregator>()
+ .score_best_balancing_migration_candidates(candidates, limit))
+ }
+
+ /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+ /// the TOPSIS method.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+ #[export]
+ pub fn score_best_balancing_migration_candidates_topsis(
+ #[try_from_ref] this: &Scheduler,
+ candidates: Vec<CompactMigrationCandidate>,
+ limit: usize,
+ ) -> Result<Vec<ScoredMigration>, Error> {
+ let usage = this.inner.lock().unwrap();
+
+ let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+ usage
+ .to_scheduler::<IdentityAggregator>()
+ .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+ }
+
/// Method: Scores nodes to start a resource with the usage statistics `resource_stats` on.
///
/// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/pve_static.rs b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
index e2756db..7924889 100644
--- a/pve-rs/src/bindings/resource_scheduling/pve_static.rs
+++ b/pve-rs/src/bindings/resource_scheduling/pve_static.rs
@@ -14,10 +14,14 @@ pub mod pve_rs_resource_scheduling_static {
use perlmod::Value;
use proxmox_resource_scheduling::node::NodeStats;
use proxmox_resource_scheduling::resource::ResourceStats;
+ use proxmox_resource_scheduling::scheduler::ScoredMigration;
use proxmox_resource_scheduling::usage::Usage;
use crate::bindings::resource_scheduling::{
- resource::PveResource, usage::StartedResourceAggregator,
+ resource::{
+ CompactMigrationCandidate, PveResource, decompose_compact_migration_candidates,
+ },
+ usage::StartedResourceAggregator,
};
perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
@@ -154,6 +158,56 @@ pub mod pve_rs_resource_scheduling_static {
usage.remove_resource(sid);
}
+ /// Method: Returns the load imbalance among the nodes.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::node_imbalance`].
+ #[export]
+ pub fn calculate_node_imbalance(#[try_from_ref] this: &Scheduler) -> f64 {
+ let usage = this.inner.lock().unwrap();
+
+ usage
+ .to_scheduler::<StartedResourceAggregator>()
+ .node_imbalance()
+ }
+
+ /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+ /// exhaustive search.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates`].
+ #[export]
+ pub fn score_best_balancing_migration_candidates(
+ #[try_from_ref] this: &Scheduler,
+ candidates: Vec<CompactMigrationCandidate>,
+ limit: usize,
+ ) -> Result<Vec<ScoredMigration>, Error> {
+ let usage = this.inner.lock().unwrap();
+
+ let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+ Ok(usage
+ .to_scheduler::<StartedResourceAggregator>()
+ .score_best_balancing_migration_candidates(candidates, limit))
+ }
+
+ /// Method: Scores the given migration `candidates` by the best node imbalance improvement with
+ /// the TOPSIS method.
+ ///
+ /// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_best_balancing_migration_candidates_topsis`].
+ #[export]
+ pub fn score_best_balancing_migration_candidates_topsis(
+ #[try_from_ref] this: &Scheduler,
+ candidates: Vec<CompactMigrationCandidate>,
+ limit: usize,
+ ) -> Result<Vec<ScoredMigration>, Error> {
+ let usage = this.inner.lock().unwrap();
+
+ let candidates = decompose_compact_migration_candidates(&usage, candidates)?;
+
+ usage
+ .to_scheduler::<StartedResourceAggregator>()
+ .score_best_balancing_migration_candidates_topsis(&candidates, limit)
+ }
+
/// Method: Scores nodes to start a service with the usage statistics `service_stats` on.
///
/// See [`proxmox_resource_scheduling::scheduler::Scheduler::score_nodes_to_start_resource`].
diff --git a/pve-rs/src/bindings/resource_scheduling/resource.rs b/pve-rs/src/bindings/resource_scheduling/resource.rs
index 91d56b9..9186d5b 100644
--- a/pve-rs/src/bindings/resource_scheduling/resource.rs
+++ b/pve-rs/src/bindings/resource_scheduling/resource.rs
@@ -1,6 +1,8 @@
use anyhow::{Error, bail};
-use proxmox_resource_scheduling::resource::{
- Resource, ResourcePlacement, ResourceState, ResourceStats,
+use proxmox_resource_scheduling::{
+ resource::{Resource, ResourcePlacement, ResourceState, ResourceStats},
+ scheduler::{Migration, MigrationCandidate},
+ usage::Usage,
};
use serde::{Deserialize, Serialize};
@@ -42,3 +44,85 @@ impl<T: Into<ResourceStats>> TryFrom<PveResource<T>> for Resource {
Ok(Resource::new(resource.stats.into(), state, placement))
}
}
+
+/// A compact representation of [`proxmox_resource_scheduling::scheduler::MigrationCandidate`].
+#[derive(Serialize, Deserialize)]
+pub struct CompactMigrationCandidate {
+ /// The identifier of the leading resource.
+ pub leader: String,
+ /// The resources which are part of the leading resource's bundle.
+ pub resources: Vec<String>,
+ /// The nodes, which are possible to migrate to for the resources.
+ pub nodes: Vec<String>,
+}
+
+/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
+/// usage from `usage`.
+///
+/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
+///
+/// - the `leader` is not present in the cluster usage
+/// - the `leader` is non-stationary
+/// - any resource in `resources` is not present in the cluster usage
+/// - any resource in `resources` is non-stationary
+/// - any resource in `resources` is on another node than the `leader`
+pub(crate) fn decompose_compact_migration_candidates(
+ usage: &Usage,
+ compact_candidates: Vec<CompactMigrationCandidate>,
+) -> Result<Vec<MigrationCandidate>, Error> {
+ // The length of `compact_candidates` is at least a lower bound
+ let mut candidates = Vec::with_capacity(compact_candidates.len());
+
+ for candidate in compact_candidates.into_iter() {
+ let leader_sid = candidate.leader;
+ let leader = match usage.get_resource(&leader_sid) {
+ Some(resource) => resource,
+ _ => bail!("leader '{leader_sid}' is not present in the cluster usage"),
+ };
+ let leader_node = match leader.placement() {
+ ResourcePlacement::Stationary { current_node } => current_node,
+ _ => bail!("leader '{leader_sid}' is non-stationary"),
+ };
+
+ if !candidate.resources.contains(&leader_sid) {
+ bail!("leader '{leader_sid}' is not present in the resources list");
+ }
+
+ let mut resource_stats = Vec::with_capacity(candidate.resources.len());
+
+ for sid in candidate.resources.iter() {
+ let resource = match usage.get_resource(sid) {
+ Some(resource) => resource,
+ _ => bail!("resource '{sid}' is not present in the cluster usage"),
+ };
+
+ match resource.placement() {
+ ResourcePlacement::Stationary { current_node } => {
+ if current_node != leader_node {
+ bail!("resource '{sid}' is on other node than leader");
+ }
+
+ resource_stats.push(resource.stats());
+ }
+ _ => bail!("resource '{sid}' is non-stationary"),
+ }
+ }
+
+ let bundle_stats = resource_stats.into_iter().sum();
+
+ for target_node in candidate.nodes.into_iter() {
+ let migration = Migration {
+ sid: leader_sid.to_string(),
+ source_node: leader_node.to_string(),
+ target_node,
+ };
+
+ candidates.push(MigrationCandidate {
+ migration,
+ stats: bundle_stats,
+ });
+ }
+ }
+
+ Ok(candidates)
+}
diff --git a/pve-rs/src/bindings/resource_scheduling/usage.rs b/pve-rs/src/bindings/resource_scheduling/usage.rs
index 87b7e3e..48f6e84 100644
--- a/pve-rs/src/bindings/resource_scheduling/usage.rs
+++ b/pve-rs/src/bindings/resource_scheduling/usage.rs
@@ -4,6 +4,21 @@ use proxmox_resource_scheduling::{
usage::{Usage, UsageAggregator},
};
+/// The identity aggregator, which passes the node stats as-is.
+pub(crate) struct IdentityAggregator;
+
+impl UsageAggregator for IdentityAggregator {
+ fn aggregate(usage: &Usage) -> Vec<NodeUsage> {
+ usage
+ .nodes_iter()
+ .map(|(nodename, node)| NodeUsage {
+ name: nodename.to_string(),
+ stats: node.stats(),
+ })
+ .collect()
+ }
+}
+
/// An aggregator, which adds any resource as a started resource.
///
/// This aggregator is useful if the node base stats do not have any current usage.
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
@ 2026-03-27 14:16 ` Dominik Rusovac
0 siblings, 0 replies; 64+ messages in thread
From: Dominik Rusovac @ 2026-03-27 14:16 UTC (permalink / raw)
To: Daniel Kral, pve-devel
lgtm
On Tue Mar 24, 2026 at 7:30 PM CET, Daniel Kral wrote:
> These methods expose the auto rebalancing methods of both the static and
> dynamic scheduler.
>
> As Scheduler::score_best_balancing_migration_candidates{,_topsis}()
> takes a possible very large list of migration candidates, the binding
> takes a more compact representation, which reduces the size that needs
> to be generated on the caller's side and therefore the runtime of the
> serialization from Perl to Rust.
>
> Additionally, while decomposing the compact representation the input
> data is validated since the underlying scoring methods do not further
> validate whether their input is consistent with the cluster usage.
>
> The method names score_best_balancing_migration_candidates{,_topsis}()
> are chosen deliberately, so that future extensions can implement
> score_best_balancing_migrations{,_topsis}(), which might allow to score
> migrations without providing the candidates.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - improve patch message and documentation
> - move to the end of the perl-rs changes, which makes it more consistent
> with the change order in pve-ha-manager as well
> - uses `UsageAggregator` now to discern how usages are accumulated
> - s/generate_migration_candidates_from
> /decompose_compact_migration_candidates
> - make the decomposition of compact migration candidates more robust and
> do not use any unwraps or other causes of panic but the Mutex guard
> unwrap
>
[snip]
> +/// Transforms a `Vec<CompactMigrationCandidate>` to a `Vec<MigrationCandidate>` with the cluster
> +/// usage from `usage`.
> +///
> +/// This function fails for any of the following conditions for a [`CompactMigrationCandidate`]:
> +///
> +/// - the `leader` is not present in the cluster usage
> +/// - the `leader` is non-stationary
> +/// - any resource in `resources` is not present in the cluster usage
> +/// - any resource in `resources` is non-stationary
> +/// - any resource in `resources` is on another node than the `leader`
nice idea, to use Vec::with_capacity in here
> +pub(crate) fn decompose_compact_migration_candidates(
> + usage: &Usage,
> + compact_candidates: Vec<CompactMigrationCandidate>,
> +) -> Result<Vec<MigrationCandidate>, Error> {
> + // The length of `compact_candidates` is at least a lower bound
> + let mut candidates = Vec::with_capacity(compact_candidates.len());
[snip]
Reviewed-by: Dominik Rusovac <d.rusovac@proxmox.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (15 preceding siblings ...)
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
` (22 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
This makes it a little easier to read and allows appending descriptions
for other values with a cleaner diff.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/PVE/DataCenterConfig.pm | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index d88b167..e7bc8f1 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -17,9 +17,12 @@ my $crs_format = {
optional => 1,
default => 'basic',
description => "Use this resource scheduler mode for HA.",
- verbose_description => "Configures how the HA manager should select nodes to start or "
- . "recover services. With 'basic', only the number of services is used, with 'static', "
- . "static CPU and memory configuration of services is considered.",
+ verbose_description => <<EODESC,
+Configures how the HA Manager should select nodes to start or recover services:
+
+- with 'basic', only the number of services is used,
+- with 'static', static CPU and memory configuration of services is considered.
+EODESC
},
'ha-rebalance-on-start' => {
type => 'boolean',
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (16 preceding siblings ...)
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
` (21 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- slightly changed wording (as suggested by @Maximiliano)
src/PVE/DataCenterConfig.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index e7bc8f1..396c962 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -13,7 +13,7 @@ my $PROXMOX_OUI = 'BC:24:11';
my $crs_format = {
ha => {
type => 'string',
- enum => ['basic', 'static'],
+ enum => ['basic', 'static', 'dynamic'],
optional => 1,
default => 'basic',
description => "Use this resource scheduler mode for HA.",
@@ -21,7 +21,8 @@ my $crs_format = {
Configures how the HA Manager should select nodes to start or recover services:
- with 'basic', only the number of services is used,
-- with 'static', static CPU and memory configuration of services is considered.
+- with 'static', static CPU and memory configuration of services is considered,
+- with 'dynamic', static and dynamic CPU and memory usage of services is considered.
EODESC
},
'ha-rebalance-on-start' => {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (17 preceding siblings ...)
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-26 16:08 ` Jillian Morgan
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
` (20 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- slightly changed wording here (as suggested by @Maximiliano)
src/PVE/DataCenterConfig.pm | 39 +++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/src/PVE/DataCenterConfig.pm b/src/PVE/DataCenterConfig.pm
index 396c962..52682aa 100644
--- a/src/PVE/DataCenterConfig.pm
+++ b/src/PVE/DataCenterConfig.pm
@@ -33,6 +33,45 @@ EODESC
"Set to use CRS for selecting a suited node when a HA services request-state"
. " changes from stop to start.",
},
+ 'ha-auto-rebalance' => {
+ type => 'boolean',
+ optional => 1,
+ default => 0,
+ description => "Whether to use CRS for balancing HA resources automatically"
+ . " depending on the current node imbalance.",
+ },
+ 'ha-auto-rebalance-threshold' => {
+ type => 'number',
+ optional => 1,
+ default => 0.7,
+ requires => 'ha-auto-rebalance',
+ description => "The threshold for the node load, which will trigger the automatic"
+ . " resource balancing system if its value is exceeded.",
+ },
+ 'ha-auto-rebalance-method' => {
+ type => 'string',
+ enum => ['bruteforce', 'topsis'],
+ optional => 1,
+ default => 'bruteforce',
+ requires => 'ha-auto-rebalance',
+ description => "The method to use for the scoring of rebalancing migrations.",
+ },
+ 'ha-auto-rebalance-hold-duration' => {
+ type => 'number',
+ optional => 1,
+ default => 3,
+ requires => 'ha-auto-rebalance',
+ description => "The duration the threshold must be exceeded for to trigger an automatic"
+ . " resource balancing migration in HA rounds.",
+ },
+ 'ha-auto-rebalance-margin' => {
+ type => 'number',
+ optional => 1,
+ default => 0.1,
+ requires => 'ha-auto-rebalance',
+ description => "The minimum relative improvement in cluster node imbalance to commit to"
+ . " a resource rebalancing migration.",
+ },
};
my $migration_format = {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-26 16:08 ` Jillian Morgan
2026-03-26 16:20 ` Daniel Kral
0 siblings, 1 reply; 64+ messages in thread
From: Jillian Morgan @ 2026-03-26 16:08 UTC (permalink / raw)
To: Daniel Kral; +Cc: pve-devel
On Tue, Mar 24, 2026 at 2:34 PM Daniel Kral <d.kral@proxmox.com> wrote:
> + 'ha-auto-rebalance-hold-duration' => {
> + type => 'number',
> + optional => 1,
> + default => 3,
> + requires => 'ha-auto-rebalance',
> + description => "The duration the threshold must be exceeded for
> to trigger an automatic"
> + . " resource balancing migration in HA rounds.",
> + },
>
>
What are the units of these duration numbers? Miliseconds or days? ;-)
Perhaps it is the "HA rounds" part that is key here but the statement is
unclear to me. Is that a duration, or a discrete number of events? How long
is each "HA round"?
Perhaps a clarification like this: "The number of HA Rounds for which
the ha-auto-rebalance-threshold must be exceeded before triggering an
automatic resource balancing migration."
And perhaps an additional hint could be provided that an HA Round is "10
seconds" (I think?)
-- Jillian
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options
2026-03-26 16:08 ` Jillian Morgan
@ 2026-03-26 16:20 ` Daniel Kral
0 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-26 16:20 UTC (permalink / raw)
To: Jillian Morgan; +Cc: pve-devel
On Thu Mar 26, 2026 at 5:08 PM CET, Jillian Morgan wrote:
> On Tue, Mar 24, 2026 at 2:34 PM Daniel Kral <d.kral@proxmox.com> wrote:
>
>> + 'ha-auto-rebalance-hold-duration' => {
>> + type => 'number',
>> + optional => 1,
>> + default => 3,
>> + requires => 'ha-auto-rebalance',
>> + description => "The duration the threshold must be exceeded for
>> to trigger an automatic"
>> + . " resource balancing migration in HA rounds.",
>> + },
>>
>>
> What are the units of these duration numbers? Miliseconds or days? ;-)
> Perhaps it is the "HA rounds" part that is key here but the statement is
> unclear to me. Is that a duration, or a discrete number of events? How long
> is each "HA round"?
>
> Perhaps a clarification like this: "The number of HA Rounds for which
> the ha-auto-rebalance-threshold must be exceeded before triggering an
> automatic resource balancing migration."
> And perhaps an additional hint could be provided that an HA Round is "10
> seconds" (I think?)
Hi Jillian!
Thanks for taking a look!
You're right, there should be more emphasis on the 'HA rounds' part!
I thought about using seconds in v1, but I went with the HA rounds as in
'number of repeated tries' as that measure is a better guarantee.
Putting "The number of HA rounds" at the start makes the 'unit' for this
property also clearer, will change it to that and add a hint about the
length of the HA rounds.
Best regards
Daniel
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (18 preceding siblings ...)
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-25 21:43 ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
` (19 subsequent siblings)
39 siblings, 1 reply; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
Fetch the dynamic node and service stats with rrd_dump(), which is
periodically sampled and broadcasted by the PVE nodes' pvestatd service
and propagated through the pmxcfs.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- use constants for the RRD entry indices
- add note about the capping of the maxcpu property for guests
src/PVE/HA/Env/PVE2.pm | 63 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 04cd1bfe..4dfb304e 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -42,6 +42,19 @@ my $lockdir = "/etc/pve/priv/lock";
# taken from PVE::Service::pvestatd::update_{lxc,qemu}_status()
use constant {
RRD_VM_INDEX_STATUS => 2,
+ RRD_VM_INDEX_MAXCPU => 5,
+ RRD_VM_INDEX_CPU => 6,
+ RRD_VM_INDEX_MAXMEM => 7,
+ RRD_VM_INDEX_MEM => 8,
+};
+
+# rrd entry indices for PVE nodes
+# taken from PVE::Service::pvestatd::update_node_status()
+use constant {
+ RRD_NODE_INDEX_MAXCPU => 4,
+ RRD_NODE_INDEX_CPU => 5,
+ RRD_NODE_INDEX_MAXMEM => 7,
+ RRD_NODE_INDEX_MEM => 8,
};
sub new {
@@ -569,6 +582,30 @@ sub get_static_service_stats {
return $stats;
}
+sub get_dynamic_service_stats {
+ my ($self, $id) = @_;
+
+ my $rrd = PVE::Cluster::rrd_dump();
+
+ my $stats = get_cluster_service_stats();
+ for my $sid (keys %$stats) {
+ my $id = $stats->{$sid}->{id};
+ my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
+
+ # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
+ my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
+
+ $stats->{$sid}->{usage} = {
+ maxcpu => $maxcpu,
+ cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+ maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
+ mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
+ };
+ }
+
+ return $stats;
+}
+
sub get_static_node_stats {
my ($self) = @_;
@@ -588,6 +625,32 @@ sub get_static_node_stats {
return $stats;
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ my $rrd = PVE::Cluster::rrd_dump();
+
+ my $stats = {};
+ for my $key (keys %$rrd) {
+ my ($nodename) = $key =~ m/^pve-node-9.0\/(\w+)$/;
+
+ next if !$nodename;
+
+ my $rrdentry = $rrd->{$key} // [];
+
+ my $maxcpu = int($rrdentry->[RRD_NODE_INDEX_MAXCPU] || 0);
+
+ $stats->{$nodename} = {
+ maxcpu => $maxcpu,
+ cpu => (($rrdentry->[RRD_NODE_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
+ maxmem => int($rrdentry->[RRD_NODE_INDEX_MAXMEM] || 0),
+ mem => int($rrdentry->[RRD_NODE_INDEX_MEM] || 0),
+ };
+ }
+
+ return $stats;
+}
+
sub get_node_version {
my ($self, $node) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-25 21:43 ` Thomas Lamprecht
0 siblings, 0 replies; 64+ messages in thread
From: Thomas Lamprecht @ 2026-03-25 21:43 UTC (permalink / raw)
To: Daniel Kral, pve-devel
Am 24.03.26 um 19:31 schrieb Daniel Kral:
> Fetch the dynamic node and service stats with rrd_dump(), which is
> periodically sampled and broadcasted by the PVE nodes' pvestatd service
> and propagated through the pmxcfs.
one small code issue inline that can be fixed up too on applying if nothing
else comes up.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - use constants for the RRD entry indices
> - add note about the capping of the maxcpu property for guests
>
> src/PVE/HA/Env/PVE2.pm | 63 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 63 insertions(+)
>
> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
> index 04cd1bfe..4dfb304e 100644
> --- a/src/PVE/HA/Env/PVE2.pm
> +++ b/src/PVE/HA/Env/PVE2.pm
> @@ -569,6 +582,30 @@ sub get_static_service_stats {
> return $stats;
> }
>
> +sub get_dynamic_service_stats {
> + my ($self, $id) = @_;
this $id param is not directly used and no calling site passes any such
param, and it's also shadowed by the one inside the loop below.
> +
> + my $rrd = PVE::Cluster::rrd_dump();
> +
> + my $stats = get_cluster_service_stats();
> + for my $sid (keys %$stats) {
> + my $id = $stats->{$sid}->{id};
> + my $rrdentry = $rrd->{"pve-vm-9.0/$id"} // [];
> +
> + # NOTE the guests' broadcasted vmstatus() caps maxcpu at the node's maxcpu
> + my $maxcpu = ($rrdentry->[RRD_VM_INDEX_MAXCPU] || 0.0) + 0.0;
> +
> + $stats->{$sid}->{usage} = {
> + maxcpu => $maxcpu,
> + cpu => (($rrdentry->[RRD_VM_INDEX_CPU] || 0.0) + 0.0) * $maxcpu,
> + maxmem => int($rrdentry->[RRD_VM_INDEX_MAXMEM] || 0),
> + mem => int($rrdentry->[RRD_VM_INDEX_MEM] || 0),
> + };
> + }
> +
> + return $stats;
> +}
> +
> sub get_static_node_stats {
> my ($self) = @_;
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (19 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
` (18 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
CRM expects f64 for cpu-related values and usize for mem-related values.
Hence, pass doubles for the former and ints for the latter.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes
src/PVE/HA/Sim/Hardware.pm | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 59afb44a..9f29fa6c 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -488,9 +488,9 @@ sub new {
|| die "Copy failed: $!\n";
} else {
my $cstatus = {
- node1 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
- node2 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
- node3 => { power => 'off', network => 'off', maxcpu => 24, maxmem => 131072 },
+ node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
};
$self->write_hardware_status_nolock($cstatus);
}
@@ -507,7 +507,7 @@ sub new {
copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
} else {
my $services = $self->read_service_config();
- my $stats = { map { $_ => { maxcpu => 4, maxmem => 4096 } } keys %$services };
+ my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
$self->write_static_service_stats($stats);
}
@@ -874,7 +874,7 @@ sub sim_hardware_cmd {
$self->set_static_service_stats(
$sid,
- { maxcpu => $params[0], maxmem => $params[1] },
+ { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
);
} elsif ($action eq 'delete') {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (20 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
` (17 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes
src/PVE/HA/Sim/Hardware.pm | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 9f29fa6c..47839112 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,6 +21,11 @@ use PVE::HA::Groups;
my $watchdog_timeout = 60;
+my $default_service_maxcpu = 4.0;
+my $default_service_maxmem = 4096 * 1024**2;
+my $default_node_maxcpu = 24.0;
+my $default_node_maxmem = 131072 * 1024**2;
+
# Status directory layout
#
# configuration
@@ -488,9 +493,24 @@ sub new {
|| die "Copy failed: $!\n";
} else {
my $cstatus = {
- node1 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
- node2 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
- node3 => { power => 'off', network => 'off', maxcpu => 24.0, maxmem => 131072 },
+ node1 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
+ node2 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
+ node3 => {
+ power => 'off',
+ network => 'off',
+ maxcpu => $default_node_maxcpu,
+ maxmem => $default_node_maxmem,
+ },
};
$self->write_hardware_status_nolock($cstatus);
}
@@ -507,7 +527,12 @@ sub new {
copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
} else {
my $services = $self->read_service_config();
- my $stats = { map { $_ => { maxcpu => 4.0, maxmem => 4096 } } keys %$services };
+ my $stats = {
+ map {
+ $_ => { maxcpu => $default_service_maxcpu, maxmem => $default_service_maxmem }
+ }
+ keys %$services
+ };
$self->write_static_service_stats($stats);
}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (21 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
` (16 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
While falsy, values of 0 or 0.0 are valid stats. Hence, use
'defined'-check to avoid skipping falsy static service stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
wrt v1:
- do not skip falsy stats
src/PVE/HA/Sim/Hardware.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 47839112..c167abd7 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -202,11 +202,11 @@ sub set_static_service_stats {
my $stats = $self->read_static_service_stats();
- if (my $memory = $new_stats->{maxmem}) {
+ if (defined(my $memory = $new_stats->{maxmem})) {
$stats->{$sid}->{maxmem} = $memory;
}
- if (my $cpu = $new_stats->{maxcpu}) {
+ if (defined(my $cpu = $new_stats->{maxcpu})) {
$stats->{$sid}->{maxcpu} = $cpu;
}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (22 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
` (15 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
This adds functionality to simulate dynamic stats of a service, that is,
cpu load (cores) and memory usage (MiB).
Analogous to static service stats, within tests, dynamic service stats
can be specified in file dynamic_service_stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
wrt v1:
- do not skip falsy stats in set_dynamic_service_stats
src/PVE/HA/Sim/Hardware.pm | 52 ++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index c167abd7..cb4a1504 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -21,8 +21,11 @@ use PVE::HA::Groups;
my $watchdog_timeout = 60;
+my $default_service_cpu = 2.0;
my $default_service_maxcpu = 4.0;
+my $default_service_mem = 2048 * 1024**2;
my $default_service_maxmem = 4096 * 1024**2;
+
my $default_node_maxcpu = 24.0;
my $default_node_maxmem = 131072 * 1024**2;
@@ -213,6 +216,25 @@ sub set_static_service_stats {
$self->write_static_service_stats($stats);
}
+sub set_dynamic_service_stats {
+ my ($self, $sid, $new_stats) = @_;
+
+ my $conf = $self->read_service_config();
+ die "no such service '$sid'" if !$conf->{$sid};
+
+ my $stats = $self->read_dynamic_service_stats();
+
+ if (defined(my $memory = $new_stats->{mem})) {
+ $stats->{$sid}->{mem} = $memory;
+ }
+
+ if (defined(my $cpu = $new_stats->{cpu})) {
+ $stats->{$sid}->{cpu} = $cpu;
+ }
+
+ $self->write_dynamic_service_stats($stats);
+}
+
sub add_service {
my ($self, $sid, $opts, $running) = @_;
@@ -438,6 +460,16 @@ sub read_static_service_stats {
return $stats;
}
+sub read_dynamic_service_stats {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/dynamic_service_stats";
+ my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
+ $self->log('error', "loading dynamic service stats failed - $@") if $@;
+
+ return $stats;
+}
+
sub write_static_service_stats {
my ($self, $stats) = @_;
@@ -446,6 +478,14 @@ sub write_static_service_stats {
$self->log('error', "writing static service stats failed - $@") if $@;
}
+sub write_dynamic_service_stats {
+ my ($self, $stats) = @_;
+
+ my $filename = "$self->{statusdir}/dynamic_service_stats";
+ eval { PVE::HA::Tools::write_json_to_file($filename, $stats) };
+ $self->log('error', "writing dynamic service stats failed - $@") if $@;
+}
+
sub new {
my ($this, $testdir) = @_;
@@ -536,6 +576,18 @@ sub new {
$self->write_static_service_stats($stats);
}
+ if (-f "$testdir/dynamic_service_stats") {
+ copy("$testdir/dynamic_service_stats", "$statusdir/dynamic_service_stats");
+ } else {
+ my $services = $self->read_static_service_stats();
+ my $stats = {
+ map { $_ => { cpu => $default_service_cpu, mem => $default_service_mem } }
+ keys %$services
+ };
+
+ $self->write_dynamic_service_stats($stats);
+ }
+
my $cstatus = $self->read_hardware_status_nolock();
foreach my $node (sort keys %$cstatus) {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (23 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
` (14 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Add command to set dynamic service stats and handle respective commands
set-dynamic-stats and set-static-stats analogously.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
wrt v1:
- merge the two branches for set-static-stats and set-dynamic-stats
commands to avoid code duplication
src/PVE/HA/Sim/Hardware.pm | 34 ++++++++++++++++++++++++++--------
src/PVE/HA/Sim/RTHardware.pm | 4 +++-
2 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index cb4a1504..89180ad7 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -795,7 +795,8 @@ sub get_cfs_state {
# service <sid> stop <timeout>
# service <sid> lock/unlock [lockname]
# service <sid> add <node> [<request-state=started>] [<running=0>]
-# service <sid> set-static-stats <maxcpu> <maxmem>
+# service <sid> set-static-stats [maxcpu <cores>] [maxmem <MiB>]
+# service <sid> set-dynamic-stats [cpu <cores>] [mem <MiB>]
# service <sid> delete
sub sim_hardware_cmd {
my ($self, $cmdstr, $logid) = @_;
@@ -945,15 +946,32 @@ sub sim_hardware_cmd {
$params[2] || 0,
);
- } elsif ($action eq 'set-static-stats') {
- die "sim_hardware_cmd: missing maxcpu for '$action' command" if !$params[0];
- die "sim_hardware_cmd: missing maxmem for '$action' command" if !$params[1];
+ } elsif ($action eq 'set-static-stats' || $action eq 'set-dynamic-stats') {
+ die "sim_hardware_cmd: missing target stat for '$action' command"
+ if !@params;
- $self->set_static_service_stats(
- $sid,
- { maxcpu => 0.0 + $params[0], maxmem => int($params[1]) },
- );
+ my $conversions =
+ $action eq 'set-static-stats'
+ ? { maxcpu => sub { 0.0 + $_[0] }, maxmem => sub { $_[0] * 1024**2 } }
+ : { cpu => sub { 0.0 + $_[0] }, mem => sub { $_[0] * 1024**2 } };
+ my %new_stats;
+ for my ($target, $val) (@params) {
+ die "sim_hardware_cmd: missing value for '$action $target' command"
+ if !defined($val);
+
+ my $convert = $conversions->{$target}
+ or die
+ "sim_hardware_cmd: unknown target stat '$target' for '$action' command";
+
+ $new_stats{$target} = $convert->($val);
+ }
+
+ if ($action eq 'set-static-stats') {
+ $self->set_static_service_stats($sid, \%new_stats);
+ } else {
+ $self->set_dynamic_service_stats($sid, \%new_stats);
+ }
} elsif ($action eq 'delete') {
$self->delete_service($sid);
diff --git a/src/PVE/HA/Sim/RTHardware.pm b/src/PVE/HA/Sim/RTHardware.pm
index 9a83d098..9528f542 100644
--- a/src/PVE/HA/Sim/RTHardware.pm
+++ b/src/PVE/HA/Sim/RTHardware.pm
@@ -532,7 +532,9 @@ sub show_service_add_dialog {
my $maxcpu = $cpu_count_spin->get_value();
my $maxmem = $memory_spin->get_value();
- $self->sim_hardware_cmd("service $sid set-static-stats $maxcpu $maxmem", 'command');
+ $self->sim_hardware_cmd(
+ "service $sid set-static-stats maxcpu $maxcpu maxmem $maxmem", 'command',
+ );
$self->add_service_to_gui($sid);
}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (24 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
` (13 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
From: Dominik Rusovac <d.rusovac@proxmox.com>
Aggregation of dynamic node stats is lazy.
Getters log on warning level in case of overcommitted stats.
Signed-off-by: Dominik Rusovac <d.rusovac@proxmox.com>
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
wrt v1:
- keep each commit functional on its own
- allow testing overcommitted scenarios
src/PVE/HA/Sim/Env.pm | 12 ++++++++
src/PVE/HA/Sim/Hardware.pm | 59 ++++++++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+)
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index ad51245c..65d4efad 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -500,12 +500,24 @@ sub get_static_service_stats {
return $self->{hardware}->get_static_service_stats();
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ return $self->{hardware}->get_dynamic_service_stats();
+}
+
sub get_static_node_stats {
my ($self) = @_;
return $self->{hardware}->get_static_node_stats();
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ return $self->{hardware}->get_dynamic_node_stats();
+}
+
sub get_node_version {
my ($self, $node) = @_;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 89180ad7..c9362fd6 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1196,6 +1196,27 @@ sub get_static_service_stats {
return $stats;
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ my $stats = get_cluster_service_stats($self);
+ my $static_stats = $self->read_static_service_stats();
+ my $dynamic_stats = $self->read_dynamic_service_stats();
+
+ for my $sid (keys %$stats) {
+ $stats->{$sid}->{usage} = {
+ $static_stats->{$sid}->%*, $dynamic_stats->{$sid}->%*,
+ };
+
+ $self->log('warning', "overcommitted cpu on '$sid'")
+ if $stats->{$sid}->{usage}->{cpu} > $stats->{$sid}->{usage}->{maxcpu};
+ $self->log('warning', "overcommitted mem on '$sid'")
+ if $stats->{$sid}->{usage}->{mem} > $stats->{$sid}->{usage}->{maxmem};
+ }
+
+ return $stats;
+}
+
sub get_static_node_stats {
my ($self) = @_;
@@ -1209,6 +1230,44 @@ sub get_static_node_stats {
return $stats;
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ my $stats = $self->get_static_node_stats();
+ for my $node (keys %$stats) {
+ $stats->{$node}->{maxcpu} = $stats->{$node}->{maxcpu} // $default_node_maxcpu;
+ $stats->{$node}->{cpu} = $stats->{$node}->{cpu} // 0.0;
+ $stats->{$node}->{maxmem} = $stats->{$node}->{maxmem} // $default_node_maxmem;
+ $stats->{$node}->{mem} = $stats->{$node}->{mem} // 0;
+ }
+
+ my $service_conf = $self->read_service_config();
+ my $dynamic_service_stats = $self->get_dynamic_service_stats();
+
+ my $cstatus = $self->read_hardware_status_nolock();
+ my $node_service_status = { map { $_ => $self->read_service_status($_) } keys %$cstatus };
+
+ for my $sid (keys %$service_conf) {
+ my $node = $service_conf->{$sid}->{node};
+
+ if ($node_service_status->{$node}->{$sid}) {
+ my ($cpu, $mem) = $dynamic_service_stats->{$sid}->{usage}->@{qw(cpu mem)};
+
+ die "unknown cpu load for '$sid'" if !defined($cpu);
+ $stats->{$node}->{cpu} += $cpu;
+ $self->log('warning', "overcommitted cpu on '$node'")
+ if $stats->{$node}->{cpu} > $stats->{$node}->{maxcpu};
+
+ die "unknown memory usage for '$sid'" if !defined($mem);
+ $stats->{$node}->{mem} += $mem;
+ $self->log('warning', "overcommitted mem on '$node'")
+ if $stats->{$node}->{mem} > $stats->{$node}->{maxmem};
+ }
+ }
+
+ return $stats;
+}
+
sub get_node_version {
my ($self, $node) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (25 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
` (12 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
The method is already dependent on three members of the service data and
in a following patch a fourth member is needed for adding more
information to the Usage implementations.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/PVE/HA/Manager.pm | 11 +++++------
src/PVE/HA/Usage.pm | 6 +++---
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 421c17da..d4b75ca9 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -281,17 +281,17 @@ sub recompute_online_node_usage {
foreach my $sid (sort keys %{ $self->{ss} }) {
my $sd = $self->{ss}->{$sid};
- $online_node_usage->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+ $online_node_usage->add_service_usage($sid, $sd);
}
# add remaining non-HA resources to online node usage
for my $sid (sort keys %$service_stats) {
next if $self->{ss}->{$sid};
- my ($node, $state) = $service_stats->{$sid}->@{qw(node state)};
-
# the migration target is not known for non-HA resources
- $online_node_usage->add_service_usage($sid, $state, $node, undef);
+ my $sd = { $service_stats->%{qw(node state)} };
+
+ $online_node_usage->add_service_usage($sid, $sd);
}
$self->{online_node_usage} = $online_node_usage;
@@ -329,8 +329,7 @@ my $change_service_state = sub {
}
$self->{online_node_usage}->remove_service_usage($sid);
- $self->{online_node_usage}
- ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
+ $self->{online_node_usage}->add_service_usage($sid, $sd);
$sd->{uid} = compute_new_uuid($new_state);
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 9f19a82b..6d53f956 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -40,12 +40,12 @@ sub add_service_usage_to_node {
die "implement in subclass";
}
-# Adds service $sid's usage to the online nodes according to their $state,
-# $service_node and $migration_target.
+# Adds service $sid's usage to the online nodes according to their service data $sd.
sub add_service_usage {
- my ($self, $sid, $service_state, $service_node, $migration_target) = @_;
+ my ($self, $sid, $sd) = @_;
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
+ my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
my ($current_node, $target_node) =
get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (26 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
` (11 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
Remove some unnecessary destructuring syntax for the helper.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/PVE/HA/Rules/ResourceAffinity.pm | 3 +--
src/PVE/HA/Usage.pm | 11 +++++------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 1c610430..474d3000 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -511,8 +511,7 @@ sub get_resource_affinity {
my $get_used_service_nodes = sub {
my ($sid) = @_;
return (undef, undef) if !defined($ss->{$sid});
- my ($state, $node, $target) = $ss->{$sid}->@{qw(state node target)};
- return PVE::HA::Usage::get_used_service_nodes($online_nodes, $state, $node, $target);
+ return PVE::HA::Usage::get_used_service_nodes($online_nodes, $ss->{$sid});
};
for my $csid (keys $positive->%*) {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 6d53f956..5f1ac226 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -45,9 +45,7 @@ sub add_service_usage {
my ($self, $sid, $sd) = @_;
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
- my ($service_state, $service_node, $migration_target) = $sd->@{qw(state node target)};
- my ($current_node, $target_node) =
- get_used_service_nodes($online_nodes, $service_state, $service_node, $migration_target);
+ my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
$self->add_service_usage_to_node($current_node, $sid) if $current_node;
$self->add_service_usage_to_node($target_node, $sid) if $target_node;
@@ -67,10 +65,11 @@ sub score_nodes_to_start_service {
}
# Returns the current and target node as a two-element array, that a service
-# puts load on according to the $online_nodes and the service's $state, $node
-# and $target.
+# puts load on according to the $online_nodes and the service data $sd.
sub get_used_service_nodes {
- my ($online_nodes, $state, $node, $target) = @_;
+ my ($online_nodes, $sd) = @_;
+
+ my ($state, $node, $target) = $sd->@{qw(state node target)};
return (undef, undef) if $state eq 'stopped' || $state eq 'request_start';
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 29/40] add running flag to cluster service stats
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (27 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
` (10 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
The running flag is needed to discriminate starting and started
resources from each other, which is a required parameter for using the
new add_service(...) method for the resource scheduling bindings.
See the next patch for the usage implementations, which passes the
running flag to the add_service(...) method, for more information about
the details.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/PVE/HA/Env/PVE2.pm | 1 +
src/PVE/HA/Manager.pm | 2 +-
src/PVE/HA/Sim/Hardware.pm | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 4dfb304e..a2173d95 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -549,6 +549,7 @@ my sub get_cluster_service_stats {
id => $id,
node => $nodename,
state => $state,
+ running => $state eq 'started',
type => $type,
usage => {},
};
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index d4b75ca9..152e18e5 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -289,7 +289,7 @@ sub recompute_online_node_usage {
next if $self->{ss}->{$sid};
# the migration target is not known for non-HA resources
- my $sd = { $service_stats->%{qw(node state)} };
+ my $sd = { $service_stats->{$sid}->%{qw(node state running)} };
$online_node_usage->add_service_usage($sid, $sd);
}
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index c9362fd6..c7e00bed 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -1165,6 +1165,7 @@ my sub get_cluster_service_stats {
$stats->{$sid} = {
node => $cfg->{node},
state => $cfg->{state},
+ running => $cfg->{state} eq 'started',
usage => {},
};
}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (28 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
` (9 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
The pve_static (and upcoming pve_dynamic) bindings expose the new
add_resource(...) method, which allow adding resources in a single call
with the additional running flag.
The running flag is needed to discriminate starting and started HA
resources from each other, which is needed to correctly account for HA
resources for the dynamic load usage implementation in the next patch.
This is because for the dynamic load usage, any HA resource, which is
scheduled to start by the HA Manager in the same round, will not be
accounted for in the next call to score_nodes_to_start_resource(...).
This is not a problem for the static load usage, because there the
current node usages are derived from the running resources on every
call already.
Passing only the HA resources' 'state' property is not enough since the
HA Manager will move any HA resource from the 'request_start' (or
through other transient states such as 'request_start_balance' and a
successful 'migrate'/'relocate') into the 'started' state.
This 'started' state is then picked up by the HA resource's LRM, which
will actually start the HA resource and if successful respond with a
'SUCCESS' LRM result. Only then will the HA Manager acknowledges this by
adding the running flag to the HA resource's state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/PVE/HA/Usage.pm | 12 +++++++-----
src/PVE/HA/Usage/Basic.pm | 9 ++++++++-
src/PVE/HA/Usage/Static.pm | 20 ++++++++++++++------
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 5f1ac226..822b884c 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,9 +33,8 @@ sub contains_node {
die "implement in subclass";
}
-# Logs a warning to $haenv upon failure, but does not die.
-sub add_service_usage_to_node {
- my ($self, $nodename, $sid) = @_;
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
die "implement in subclass";
}
@@ -47,8 +46,11 @@ sub add_service_usage {
my $online_nodes = { map { $_ => 1 } $self->list_nodes() };
my ($current_node, $target_node) = get_used_service_nodes($online_nodes, $sd);
- $self->add_service_usage_to_node($current_node, $sid) if $current_node;
- $self->add_service_usage_to_node($target_node, $sid) if $target_node;
+ # some usage implementations need to discern whether a service is truly running
+ # a service does only have the 'running' flag in 'started' state
+ my $running = ($sd->{state} eq 'started' && $sd->{running}) || defined($current_node);
+
+ $self->add_service($sid, $current_node, $target_node, $running);
}
sub remove_service_usage {
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
index 2584727b..5aa3ac05 100644
--- a/src/PVE/HA/Usage/Basic.pm
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -38,7 +38,7 @@ sub contains_node {
return defined($self->{nodes}->{$nodename});
}
-sub add_service_usage_to_node {
+my sub add_service_usage_to_node {
my ($self, $nodename, $sid) = @_;
if ($self->contains_node($nodename)) {
@@ -51,6 +51,13 @@ sub add_service_usage_to_node {
}
}
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+ add_service_usage_to_node($self, $current_node, $sid) if defined($current_node);
+ add_service_usage_to_node($self, $target_node, $sid) if defined($target_node);
+}
+
sub remove_service_usage {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 6ff20794..835f4300 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -69,17 +69,25 @@ my sub get_service_usage {
return $service_stats;
}
-sub add_service_usage_to_node {
- my ($self, $nodename, $sid) = @_;
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
- $self->{'node-services'}->{$nodename}->{$sid} = 1;
+ # do not add service which do not put any usage on the nodes
+ return if !defined($current_node) && !defined($target_node);
eval {
my $service_usage = get_service_usage($self, $sid);
- $self->{scheduler}->add_service_usage_to_node($nodename, $sid, $service_usage);
+
+ my $service = {
+ stats => $service_usage,
+ running => $running,
+ current_node => $current_node,
+ target_node => $target_node,
+ };
+
+ $self->{scheduler}->add_service($sid, $service);
};
- $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
- if $@;
+ $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
}
sub remove_service_usage {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (29 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
` (8 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
The dynamic usage scheduler allows the HA Manager to make scheduling
decisions based on the current usage of the nodes and cluster resources
in addition to the maximum usage stats as reported by the PVE::HA::Env
implementation.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- guard PVE::HA::Usage::Dynamic with my $have_dynamic_scheduling as
PVE::RS::ResourceScheduling::Dynamic might not be available (as
suggested by @Thomas)
- add add_service() impl
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Env.pm | 12 ++++
src/PVE/HA/Manager.pm | 21 +++++++
src/PVE/HA/Usage/Dynamic.pm | 110 ++++++++++++++++++++++++++++++++++
src/PVE/HA/Usage/Makefile | 2 +-
5 files changed, 145 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Usage/Dynamic.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 38d5d60b..75220a0b 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -42,6 +42,7 @@
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
/usr/share/perl5/PVE/HA/Usage/Static.pm
+/usr/share/perl5/PVE/HA/Usage/Dynamic.pm
/usr/share/perl5/PVE/Service/pve_ha_crm.pm
/usr/share/perl5/PVE/Service/pve_ha_lrm.pm
/usr/share/pve-manager/templates/default/fencing-body.html.hbs
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 3643292e..44c26854 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -312,12 +312,24 @@ sub get_static_service_stats {
return $self->{plug}->get_static_service_stats();
}
+sub get_dynamic_service_stats {
+ my ($self) = @_;
+
+ return $self->{plug}->get_dynamic_service_stats();
+}
+
sub get_static_node_stats {
my ($self) = @_;
return $self->{plug}->get_static_node_stats();
}
+sub get_dynamic_node_stats {
+ my ($self) = @_;
+
+ return $self->{plug}->get_dynamic_node_stats();
+}
+
sub get_node_version {
my ($self, $node) = @_;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 152e18e5..6f7b431b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -21,6 +21,12 @@ eval {
$have_static_scheduling = 1;
};
+my $have_dynamic_scheduling;
+eval {
+ require PVE::HA::Usage::Dynamic;
+ $have_dynamic_scheduling = 1;
+};
+
## Variable Name & Abbreviations Convention
#
# The HA stack has some variables it uses frequently and thus abbreviates it such that it may be
@@ -264,6 +270,21 @@ sub recompute_online_node_usage {
'warning',
"fallback to 'basic' scheduler mode, init for 'static' failed - $@",
) if $@;
+ } elsif ($mode eq 'dynamic') {
+ if ($have_dynamic_scheduling) {
+ $online_node_usage = eval {
+ $service_stats = $haenv->get_dynamic_service_stats();
+ my $scheduler = PVE::HA::Usage::Dynamic->new($haenv, $service_stats);
+ $scheduler->add_node($_) for $online_nodes->@*;
+ return $scheduler;
+ };
+ } else {
+ $@ = "dynamic scheduling not available\n";
+ }
+ $haenv->log(
+ 'warning',
+ "fallback to 'basic' scheduler mode, init for 'dynamic' failed - $@",
+ ) if $@;
} elsif ($mode eq 'basic') {
# handled below in the general fall-back case
} else {
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
new file mode 100644
index 00000000..7e11715d
--- /dev/null
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -0,0 +1,110 @@
+package PVE::HA::Usage::Dynamic;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Dynamic;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+ my ($class, $haenv, $service_stats) = @_;
+
+ my $node_stats = eval { $haenv->get_dynamic_node_stats() };
+ die "did not get dynamic node usage information - $@" if $@;
+
+ my $scheduler = eval { PVE::RS::ResourceScheduling::Dynamic->new() };
+ die "unable to initialize dynamic scheduling - $@" if $@;
+
+ return bless {
+ 'node-stats' => $node_stats,
+ 'service-stats' => $service_stats,
+ haenv => $haenv,
+ scheduler => $scheduler,
+ }, $class;
+}
+
+sub add_node {
+ my ($self, $nodename) = @_;
+
+ my $stats = $self->{'node-stats'}->{$nodename}
+ or die "did not get dynamic node usage information for '$nodename'\n";
+ die "dynamic node usage information for '$nodename' missing cpu count\n" if !$stats->{maxcpu};
+ die "dynamic node usage information for '$nodename' missing memory\n" if !$stats->{maxmem};
+
+ eval { $self->{scheduler}->add_node($nodename, $stats); };
+ die "initializing dynamic node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+ my ($self, $nodename) = @_;
+
+ $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+ my ($self) = @_;
+
+ return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+ my ($self, $nodename) = @_;
+
+ return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+ my ($self, $sid) = @_;
+
+ my $service_stats = $self->{'service-stats'}->{$sid}->{usage}
+ or die "did not get dynamic service usage information for '$sid'\n";
+
+ return $service_stats;
+}
+
+sub add_service {
+ my ($self, $sid, $current_node, $target_node, $running) = @_;
+
+ # do not add service which do not put any usage on the nodes
+ return if !defined($current_node) && !defined($target_node);
+
+ eval {
+ my $service_usage = get_service_usage($self, $sid);
+
+ my $service = {
+ stats => $service_usage,
+ running => $running,
+ current_node => $current_node,
+ target_node => $target_node,
+ };
+
+ $self->{scheduler}->add_resource($sid, $service);
+ };
+ $self->{haenv}->log('warning', "unable to add service '$sid' - $@") if $@;
+}
+
+sub remove_service_usage {
+ my ($self, $sid) = @_;
+
+ eval { $self->{scheduler}->remove_resource($sid) };
+ $self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
+}
+
+sub score_nodes_to_start_service {
+ my ($self, $sid) = @_;
+
+ my $score_list = eval {
+ my $service_usage = get_service_usage($self, $sid);
+ $self->{scheduler}->score_nodes_to_start_resource($service_usage);
+ };
+ $self->{haenv}
+ ->log('err', "unable to score nodes according to dynamic usage for service '$sid' - $@")
+ if $@;
+
+ # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+ return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index befdda60..5d51a9c1 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,5 +1,5 @@
SIM_SOURCES=Basic.pm
-SOURCES=${SIM_SOURCES} Static.pm
+SOURCES=${SIM_SOURCES} Static.pm Dynamic.pm
.PHONY: install
install:
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (30 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
` (7 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases document the basic behavior of the scheduler using the
dynamic usage information of the HA resources with rebalance-on-start
being cleared and set respectively.
As the mechanisms for the scheduler with static and dynamic usage
information are mostly the same, these test cases verify only the
essential parts, which are:
- dynamic usage information is used correctly (for both test cases), and
- repeatedly scheduling resources with score_nodes_to_start_service(...)
correctly simulates that the previously scheduled HA resources are
already started
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/test/test-crs-dynamic-rebalance1/README | 3 +
src/test/test-crs-dynamic-rebalance1/cmdlist | 4 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 7 ++
.../hardware_status | 5 ++
.../test-crs-dynamic-rebalance1/log.expect | 88 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 7 ++
.../static_service_stats | 7 ++
src/test/test-crs-dynamic1/README | 4 +
src/test/test-crs-dynamic1/cmdlist | 4 +
src/test/test-crs-dynamic1/datacenter.cfg | 6 ++
.../test-crs-dynamic1/dynamic_service_stats | 3 +
src/test/test-crs-dynamic1/hardware_status | 5 ++
src/test/test-crs-dynamic1/log.expect | 51 +++++++++++
src/test/test-crs-dynamic1/manager_status | 1 +
src/test/test-crs-dynamic1/service_config | 3 +
.../test-crs-dynamic1/static_service_stats | 3 +
18 files changed, 209 insertions(+)
create mode 100644 src/test/test-crs-dynamic-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic1/README
create mode 100644 src/test/test-crs-dynamic1/cmdlist
create mode 100644 src/test/test-crs-dynamic1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic1/hardware_status
create mode 100644 src/test/test-crs-dynamic1/log.expect
create mode 100644 src/test/test-crs-dynamic1/manager_status
create mode 100644 src/test/test-crs-dynamic1/service_config
create mode 100644 src/test/test-crs-dynamic1/static_service_stats
diff --git a/src/test/test-crs-dynamic-rebalance1/README b/src/test/test-crs-dynamic-rebalance1/README
new file mode 100644
index 00000000..df0ba0a8
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/README
@@ -0,0 +1,3 @@
+Test rebalancing on start and how after a failed node the recovery gets
+balanced out for a small batch of HA resources with the dynamic usage
+information.
diff --git a/src/test/test-crs-dynamic-rebalance1/cmdlist b/src/test/test-crs-dynamic-rebalance1/cmdlist
new file mode 100644
index 00000000..eee0e40e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-crs-dynamic-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..0f76d24e
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-rebalance-on-start": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..5ef75ae0
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/dynamic_service_stats
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "cpu": 1.3, "mem": 1073741824 },
+ "vm:102": { "cpu": 5.6, "mem": 3221225472 },
+ "vm:103": { "cpu": 0.5, "mem": 4000000000 },
+ "vm:104": { "cpu": 7.9, "mem": 2147483648 },
+ "vm:105": { "cpu": 3.2, "mem": 2684354560 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/hardware_status b/src/test/test-crs-dynamic-rebalance1/hardware_status
new file mode 100644
index 00000000..bfdbbf7b
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 256000000000 }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/log.expect b/src/test/test-crs-dynamic-rebalance1/log.expect
new file mode 100644
index 00000000..4017f7be
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/log.expect
@@ -0,0 +1,88 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node3'
+info 20 node1/crm: adding new service 'vm:102' on node 'node3'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: service vm:101: re-balance selected new node node1 for startup
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node1)
+info 20 node1/crm: service vm:102: re-balance selected new node node2 for startup
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance' (node = node3, target = node2)
+info 20 node1/crm: service vm:103: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service vm:104: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service vm:105: re-balance selected current node node3 for startup
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: service vm:101 - start relocate to node 'node1'
+info 25 node3/lrm: service vm:101 - end relocate to node 'node1'
+info 25 node3/lrm: service vm:102 - start relocate to node 'node2'
+info 25 node3/lrm: service vm:102 - end relocate to node 'node2'
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 25 node3/lrm: starting service vm:104
+info 25 node3/lrm: service status vm:104 started
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 40 node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started' (node = node1)
+info 40 node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started' (node = node2)
+info 41 node1/lrm: starting service vm:101
+info 41 node1/lrm: service status vm:101 started
+info 43 node2/lrm: starting service vm:102
+info 43 node2/lrm: service status vm:102 started
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:104': state changed from 'started' to 'fence'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:104': state changed from 'fence' to 'recovery'
+info 240 node1/crm: service 'vm:105': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:104' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:104': state changed from 'recovery' to 'started' (node = node1)
+info 240 node1/crm: recover service 'vm:105' from fenced node 'node3' to node 'node1'
+info 240 node1/crm: service 'vm:105': state changed from 'recovery' to 'started' (node = node1)
+info 241 node1/lrm: starting service vm:103
+info 241 node1/lrm: service status vm:103 started
+info 241 node1/lrm: starting service vm:104
+info 241 node1/lrm: service status vm:104 started
+info 241 node1/lrm: starting service vm:105
+info 241 node1/lrm: service status vm:105 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-rebalance1/manager_status b/src/test/test-crs-dynamic-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-rebalance1/service_config b/src/test/test-crs-dynamic-rebalance1/service_config
new file mode 100644
index 00000000..3071f480
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/service_config
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "node": "node3", "state": "started" },
+ "vm:102": { "node": "node3", "state": "started" },
+ "vm:103": { "node": "node3", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-rebalance1/static_service_stats b/src/test/test-crs-dynamic-rebalance1/static_service_stats
new file mode 100644
index 00000000..a9e810d7
--- /dev/null
+++ b/src/test/test-crs-dynamic-rebalance1/static_service_stats
@@ -0,0 +1,7 @@
+{
+ "vm:101": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:102": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:103": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:104": { "maxcpu": 8, "maxmem": 4294967296 },
+ "vm:105": { "maxcpu": 8, "maxmem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic1/README b/src/test/test-crs-dynamic1/README
new file mode 100644
index 00000000..e6382130
--- /dev/null
+++ b/src/test/test-crs-dynamic1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with dynamic usage information.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-dynamic1/cmdlist b/src/test/test-crs-dynamic1/cmdlist
new file mode 100644
index 00000000..8684073c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-dynamic1/datacenter.cfg b/src/test/test-crs-dynamic1/datacenter.cfg
new file mode 100644
index 00000000..6a7fbc48
--- /dev/null
+++ b/src/test/test-crs-dynamic1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "dynamic"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic1/dynamic_service_stats b/src/test/test-crs-dynamic1/dynamic_service_stats
new file mode 100644
index 00000000..922ae9a6
--- /dev/null
+++ b/src/test/test-crs-dynamic1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "cpu": 5.9, "mem": 2744123392 }
+}
diff --git a/src/test/test-crs-dynamic1/hardware_status b/src/test/test-crs-dynamic1/hardware_status
new file mode 100644
index 00000000..bbe44a96
--- /dev/null
+++ b/src/test/test-crs-dynamic1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 100000000000 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 200000000000 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 32, "maxmem": 300000000000 }
+}
diff --git a/src/test/test-crs-dynamic1/log.expect b/src/test/test-crs-dynamic1/log.expect
new file mode 100644
index 00000000..b7e298e1
--- /dev/null
+++ b/src/test/test-crs-dynamic1/log.expect
@@ -0,0 +1,51 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node1 off
+info 120 node1/crm: status change master => lost_manager_lock
+info 120 node1/crm: status change lost_manager_lock => wait_for_quorum
+info 121 node1/lrm: status change active => lost_agent_lock
+info 162 watchdog: execute power node1 off
+info 161 node1/crm: killed by poweroff
+info 162 node1/lrm: killed by poweroff
+info 162 hardware: server 'node1' stopped by poweroff (watchdog)
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'dynamic'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info 282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3)
+info 283 node3/lrm: got lock 'ha_agent_node3_lock'
+info 283 node3/lrm: status change wait_for_agent_lock => active
+info 283 node3/lrm: starting service vm:102
+info 283 node3/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic1/manager_status b/src/test/test-crs-dynamic1/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic1/service_config b/src/test/test-crs-dynamic1/service_config
new file mode 100644
index 00000000..9c124471
--- /dev/null
+++ b/src/test/test-crs-dynamic1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-dynamic1/static_service_stats b/src/test/test-crs-dynamic1/static_service_stats
new file mode 100644
index 00000000..1819d24c
--- /dev/null
+++ b/src/test/test-crs-dynamic1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "maxcpu": 8, "maxmem": 4294967296 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (31 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
` (6 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
The name is misleading, because the HA resource migration is not
executed, but only queues the HA resource to change into the state
'migrate' or 'relocate', which is then picked up by the respective LRM
to execute.
The term 'resource motion' also generalizes the different actions
implied by the 'migrate' and 'relocate' command and state.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes
src/PVE/HA/Manager.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 6f7b431b..b954092b 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -416,7 +416,7 @@ sub read_lrm_status {
return ($results, $modes);
}
-sub execute_migration {
+sub queue_resource_motion {
my ($self, $cmd, $task, $sid, $target) = @_;
my ($haenv, $ss) = $self->@{qw(haenv ss)};
@@ -485,7 +485,7 @@ sub update_crm_commands {
"ignore crm command - service already on target node: $cmd",
);
} else {
- $self->execute_migration($cmd, $task, $sid, $node);
+ $self->queue_resource_motion($cmd, $task, $sid, $node);
}
}
} else {
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (32 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
` (5 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- no changes
src/PVE/HA/Manager.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b954092b..872d43c4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -91,11 +91,12 @@ sub update_crs_scheduler_mode {
my $haenv = $self->{haenv};
my $dc_cfg = $haenv->get_datacenter_settings();
+ my $crs_cfg = $dc_cfg->{crs};
- $self->{crs}->{rebalance_on_request_start} = !!$dc_cfg->{crs}->{'ha-rebalance-on-start'};
+ $self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
my $old_mode = $self->{crs}->{scheduler};
- my $new_mode = $dc_cfg->{crs}->{ha} || 'basic';
+ my $new_mode = $crs_cfg->{ha} || 'basic';
if (!defined($old_mode)) {
$haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 35/40] implement automatic rebalancing
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (33 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
` (4 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
If the automatic load balancing system is enabled, it checks whether the
cluster node imbalance exceeds some user-defined threshold for some HA
Manager rounds ("hold duration"). If it does exceed on consecutive HA
Manager rounds, it will choose the best resource motion to improve the
cluster node imbalance and queue it if it significantly improves it by
some user-defined imbalance improvement ("margin").
This patch introduces resource bundles, which ensure that HA resources
in strict positive resource affinity rules are considered as a whole
"bundle" instead of individual HA resources.
Specifically, active and stationary resource bundles are resource
bundles, that have at least one resource running and all resources
located on the same node. This distinction is needed as newly created
strict positive resource affinity rules may still require some resource
motions to enforce the rule.
Additionally, the migration candidate generation prunes any target
nodes, which do not adhere to the HA rules of these resource bundles
before scoring these migration candidates.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- add more context in patch message
- add comment to sustained_imbalance_round (as suggested by @Thomas)
- fix issue where resource bundle was created even though some dependent
resources were still migrating or relocating
- remove debug logging of node imbalance
- remove unused calculate_node_loads()
- remove select_best_balancing_migration{,_topsis}() from Static and
Dynamic and make it a proxy in PVE::HA::Usage
src/PVE/HA/Manager.pm | 177 +++++++++++++++++++++++++++++++++++-
src/PVE/HA/Usage.pm | 34 +++++++
src/PVE/HA/Usage/Dynamic.pm | 33 +++++++
src/PVE/HA/Usage/Static.pm | 33 +++++++
4 files changed, 276 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 872d43c4..73146b56 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -59,10 +59,17 @@ sub new {
my $self = bless {
haenv => $haenv,
- crs => {},
+ crs => {
+ auto_rebalance => {},
+ },
last_rules_digest => '',
last_groups_digest => '',
last_services_digest => '',
+ # used to track how many HA rounds the imbalance threshold has been exceeded
+ #
+ # this is not persisted for a CRM failover as in the mean time
+ # the usage statistics might have change quite a bit already
+ sustained_imbalance_round => 0,
group_migration_round => 3, # wait a little bit
}, $class;
@@ -94,6 +101,13 @@ sub update_crs_scheduler_mode {
my $crs_cfg = $dc_cfg->{crs};
$self->{crs}->{rebalance_on_request_start} = !!$crs_cfg->{'ha-rebalance-on-start'};
+ $self->{crs}->{auto_rebalance}->{enable} = !!$crs_cfg->{'ha-auto-rebalance'};
+ $self->{crs}->{auto_rebalance}->{threshold} = $crs_cfg->{'ha-auto-rebalance-threshold'} // 0.7;
+ $self->{crs}->{auto_rebalance}->{method} = $crs_cfg->{'ha-auto-rebalance-method'}
+ // 'bruteforce';
+ $self->{crs}->{auto_rebalance}->{hold_duration} = $crs_cfg->{'ha-auto-rebalance-hold-duration'}
+ // 3;
+ $self->{crs}->{auto_rebalance}->{margin} = $crs_cfg->{'ha-auto-rebalance-margin'} // 0.1;
my $old_mode = $self->{crs}->{scheduler};
my $new_mode = $crs_cfg->{ha} || 'basic';
@@ -111,6 +125,150 @@ sub update_crs_scheduler_mode {
return;
}
+# Returns a hash of lists, which contain the running, non-moving HA resource
+# bundles, which are on the same node, implied by the strict positive resource
+# affinity rules.
+#
+# Each resource bundle has a leader, which is the alphabetically first running
+# HA resource in the resource bundle and also the key of each resource bundle
+# in the returned hash.
+sub get_active_stationary_resource_bundles {
+ my ($ss, $resource_affinity) = @_;
+
+ my $resource_bundles = {};
+OUTER: for my $sid (sort keys %$ss) {
+ # do not consider non-started resource as 'active' leading resource
+ next if $ss->{$sid}->{state} ne 'started';
+
+ my @resources = ($sid);
+ my $nodes = { $ss->{$sid}->{node} => 1 };
+
+ my ($dependent_resources) = get_affinitive_resources($resource_affinity, $sid);
+ if (%$dependent_resources) {
+ for my $csid (keys %$dependent_resources) {
+ next if !defined($ss->{$csid});
+ my ($state, $node) = $ss->{$csid}->@{qw(state node)};
+
+ # do not consider stationary bundle if a dependent resource moves
+ next OUTER if $state eq 'migrate' || $state eq 'relocate';
+ # do not add non-started resource to active bundle
+ next if $state ne 'started';
+
+ $nodes->{$node} = 1;
+
+ push @resources, $csid;
+ }
+
+ @resources = sort @resources;
+ }
+
+ # skip resource bundles, which are not on the same node yet
+ next if keys %$nodes > 1;
+
+ my $leader_sid = $resources[0];
+
+ $resource_bundles->{$leader_sid} = \@resources;
+ }
+
+ return $resource_bundles;
+}
+
+# Returns a hash of hashes, where each item contains the resource bundle's
+# leader, the list of HA resources in the resource bundle, and the list of
+# possible nodes to migrate to.
+sub get_resource_migration_candidates {
+ my ($self) = @_;
+
+ my ($ss, $compiled_rules, $online_node_usage) =
+ $self->@{qw(ss compiled_rules online_node_usage)};
+ my ($node_affinity, $resource_affinity) =
+ $compiled_rules->@{qw(node-affinity resource-affinity)};
+
+ my $resource_bundles = get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+ my @compact_migration_candidates = ();
+ for my $leader_sid (sort keys %$resource_bundles) {
+ my $current_leader_node = $ss->{$leader_sid}->{node};
+ my $online_nodes = { map { $_ => 1 } $online_node_usage->list_nodes() };
+
+ my (undef, $target_nodes) = get_node_affinity($node_affinity, $leader_sid, $online_nodes);
+ my ($together, $separate) =
+ get_resource_affinity($resource_affinity, $leader_sid, $ss, $online_nodes);
+ apply_negative_resource_affinity($separate, $target_nodes);
+
+ delete $target_nodes->{$current_leader_node};
+
+ next if !%$target_nodes;
+
+ push @compact_migration_candidates,
+ {
+ leader => $leader_sid,
+ nodes => [sort keys %$target_nodes],
+ resources => $resource_bundles->{$leader_sid},
+ };
+ }
+
+ return \@compact_migration_candidates;
+}
+
+sub load_balance {
+ my ($self) = @_;
+
+ my ($crs, $haenv, $online_node_usage) = $self->@{qw(crs haenv online_node_usage)};
+ my ($auto_rebalance_opts) = $crs->{auto_rebalance};
+
+ return if !$auto_rebalance_opts->{enable};
+ return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dynamic';
+ return if $self->any_resource_motion_queued_or_running();
+
+ my ($threshold, $method, $hold_duration, $margin) =
+ $auto_rebalance_opts->@{qw(threshold method hold_duration margin)};
+
+ my $imbalance = $online_node_usage->calculate_node_imbalance();
+
+ # do not load balance unless imbalance threshold has been exceeded
+ # consecutively for $hold_duration calls to load_balance()
+ if ($imbalance < $threshold) {
+ $self->{sustained_imbalance_round} = 0;
+ return;
+ } else {
+ $self->{sustained_imbalance_round}++;
+ return if $self->{sustained_imbalance_round} < $hold_duration;
+ $self->{sustained_imbalance_round} = 0;
+ }
+
+ my $candidates = $self->get_resource_migration_candidates();
+
+ my $result;
+ if ($method eq 'bruteforce') {
+ $result = $online_node_usage->select_best_balancing_migration($candidates);
+ } elsif ($method eq 'topsis') {
+ $result = $online_node_usage->select_best_balancing_migration_topsis($candidates);
+ }
+
+ # happens if $candidates is empty or $method isn't handled above
+ return if !$result;
+
+ my ($migration, $target_imbalance) = $result->@{qw(migration imbalance)};
+
+ my $relative_change = ($imbalance - $target_imbalance) / $imbalance;
+ return if $relative_change < $margin;
+
+ my ($sid, $source, $target) = $migration->@{qw(sid source-node target-node)};
+
+ my (undef, $type, $id) = $haenv->parse_sid($sid);
+ my $task = $type eq 'vm' ? "migrate" : "relocate";
+ my $cmd = "$task $sid $target";
+
+ my $target_imbalance_str = int(100 * $target_imbalance + 0.5) / 100;
+ $haenv->log(
+ 'info',
+ "auto rebalance - $task $sid to $target (expected target imbalance: $target_imbalance_str)",
+ );
+
+ $self->queue_resource_motion($cmd, $task, $sid, $target);
+}
+
sub cleanup {
my ($self) = @_;
@@ -463,6 +621,21 @@ sub queue_resource_motion {
}
}
+sub any_resource_motion_queued_or_running {
+ my ($self) = @_;
+
+ my ($ss) = $self->@{qw(ss)};
+
+ for my $sid (keys %$ss) {
+ my ($cmd, $state) = $ss->{$sid}->@{qw(cmd state)};
+
+ return 1 if $state eq 'migrate' || $state eq 'relocate';
+ return 1 if defined($cmd) && ($cmd->[0] eq 'migrate' || $cmd->[0] eq 'relocate');
+ }
+
+ return 0;
+}
+
# read new crm commands and save them into crm master status
sub update_crm_commands {
my ($self) = @_;
@@ -746,6 +919,8 @@ sub manage {
$self->update_crm_commands();
+ $self->load_balance();
+
for (;;) {
my $repeat = 0;
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 822b884c..dc029e86 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -59,6 +59,40 @@ sub remove_service_usage {
die "implement in subclass";
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ die "implement in subclass";
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ die "implement in subclass";
+}
+
+sub select_best_balancing_migration {
+ my ($self, $migration_candidates) = @_;
+
+ my $migrations = $self->score_best_balancing_migrations($migration_candidates, 1);
+
+ return $migrations->[0];
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ die "implement in subclass";
+}
+
+sub select_best_balancing_migration_topsis {
+ my ($self, $migration_candidates) = @_;
+
+ my $migrations = $self->score_best_balancing_migrations_topsis($migration_candidates, 1);
+
+ return $migrations->[0];
+}
+
# Returns a hash with $nodename => $score pairs. A lower $score is better.
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Dynamic.pm b/src/PVE/HA/Usage/Dynamic.pm
index 7e11715d..a8adfe83 100644
--- a/src/PVE/HA/Usage/Dynamic.pm
+++ b/src/PVE/HA/Usage/Dynamic.pm
@@ -92,6 +92,39 @@ sub remove_service_usage {
$self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+ $self->{haenv}->log('warning', "unable to calculate dynamic node imbalance - $@") if $@;
+
+ return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index 835f4300..92bfaaa7 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -99,6 +99,39 @@ sub remove_service_usage {
$self->{haenv}->log('warning', "unable to remove service '$sid' usage - $@") if $@;
}
+sub calculate_node_imbalance {
+ my ($self) = @_;
+
+ my $node_imbalance = eval { $self->{scheduler}->calculate_node_imbalance() };
+ $self->{haenv}->log('warning', "unable to calculate static node imbalance - $@") if $@;
+
+ return $node_imbalance // 0.0;
+}
+
+sub score_best_balancing_migrations {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
+sub score_best_balancing_migrations_topsis {
+ my ($self, $migration_candidates, $limit) = @_;
+
+ my $migrations = eval {
+ $self->{scheduler}
+ ->score_best_balancing_migration_candidates_topsis($migration_candidates, $limit);
+ };
+ $self->{haenv}->log('warning', "unable to score best balancing migration - $@") if $@;
+
+ return $migrations;
+}
+
sub score_nodes_to_start_service {
my ($self, $sid) = @_;
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (34 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
` (3 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases document which resource bundles count as active and
stationary and ensure that get_active_stationary_resource_bundles(...)
does produce the correct active, stationary resource bundles.
This is especially important, because these resource bundles are used
for the load balancing candidate generation, which is passed to
score_best_balancing_migration_candidates($candidates, ...). The
PVE::HA::Usage::{Static,Dynamic} implementation validates these
candidates and fails with an user-visible error message.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
src/test/Makefile | 1 +
src/test/test_resource_bundles.pl | 234 ++++++++++++++++++++++++++++++
2 files changed, 235 insertions(+)
create mode 100755 src/test/test_resource_bundles.pl
diff --git a/src/test/Makefile b/src/test/Makefile
index 6da9e100..f72b755b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -6,6 +6,7 @@ test:
@echo "-- start regression tests --"
./test_failover1.pl
./test_rules_config.pl
+ ./test_resource_bundles.pl
./ha-tester.pl
./test_fence_config.pl
@echo "-- end regression tests (success) --"
diff --git a/src/test/test_resource_bundles.pl b/src/test/test_resource_bundles.pl
new file mode 100755
index 00000000..d38dc516
--- /dev/null
+++ b/src/test/test_resource_bundles.pl
@@ -0,0 +1,234 @@
+#!/usr/bin/perl
+
+use v5.36;
+
+use lib qw(..);
+
+use Test::More;
+
+use PVE::HA::Manager;
+
+my $get_active_stationary_resource_bundle_tests = [
+ {
+ description => "trivial resource bundles",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {},
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101',
+ ],
+ 'vm:102' => [
+ 'vm:102',
+ ],
+ },
+ },
+ {
+ description => "simple resource bundle",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101', 'vm:102',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with first resource stopped",
+ services => {
+ 'vm:101' => {
+ state => 'stopped',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:102' => [
+ 'vm:102', 'vm:103',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with some stopped resources",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'stopped',
+ node => 'node1',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {
+ 'vm:101' => [
+ 'vm:101', 'vm:103',
+ ],
+ },
+ },
+ {
+ description => "resource bundle with moving resources",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'migrate',
+ node => 'node2',
+ target => 'node1',
+ },
+ 'vm:103' => {
+ state => 'relocate',
+ node => 'node3',
+ target => 'node1',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {},
+ },
+ # might happen if the resource bundle is generated even before the HA Manager
+ # puts the HA resources in migrate/relocate to make them adhere to the HA rules
+ {
+ description => "resource bundle with resources on different nodes",
+ services => {
+ 'vm:101' => {
+ state => 'started',
+ node => 'node1',
+ },
+ 'vm:102' => {
+ state => 'started',
+ node => 'node2',
+ },
+ 'vm:103' => {
+ state => 'started',
+ node => 'node3',
+ },
+ },
+ resource_affinity => {
+ positive => {
+ 'vm:101' => {
+ 'vm:102' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:102' => {
+ 'vm:101' => 1,
+ 'vm:103' => 1,
+ },
+ 'vm:103' => {
+ 'vm:101' => 1,
+ 'vm:102' => 1,
+ },
+ },
+ negative => {},
+ },
+ resource_bundles => {},
+ },
+];
+
+my $tests = [
+ @$get_active_stationary_resource_bundle_tests,
+];
+
+plan(tests => scalar($tests->@*));
+
+for my $case ($get_active_stationary_resource_bundle_tests->@*) {
+ my ($ss, $resource_affinity) = $case->@{qw(services resource_affinity)};
+
+ my $result = PVE::HA::Manager::get_active_stationary_resource_bundles($ss, $resource_affinity);
+
+ is_deeply($result, $case->{resource_bundles}, $case->{description});
+}
+
+done_testing();
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system test cases
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (35 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
` (2 subsequent siblings)
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases document the basic behavior of the automatic load
rebalancer using the dynamic usage stats.
As an overview:
- Case 0: rebalancing system is inactive for no configured HA resources
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance and converge if
the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance through dynamic
changes in their usage
- Case 4: rebalancing system doesn't trigger a migration if the node
imbalance is exceeded once but isn't sustained for at least
the set hold duration
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
.../test-crs-dynamic-auto-rebalance0/README | 2 +
.../test-crs-dynamic-auto-rebalance0/cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 1 +
.../hardware_status | 5 ++
.../log.expect | 11 +++
.../manager_status | 1 +
.../service_config | 1 +
.../static_service_stats | 1 +
.../test-crs-dynamic-auto-rebalance1/README | 7 ++
.../test-crs-dynamic-auto-rebalance1/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 3 +
.../hardware_status | 5 ++
.../log.expect | 25 ++++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../test-crs-dynamic-auto-rebalance2/README | 4 +
.../test-crs-dynamic-auto-rebalance2/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../test-crs-dynamic-auto-rebalance3/README | 4 +
.../test-crs-dynamic-auto-rebalance3/cmdlist | 16 ++++
.../datacenter.cfg | 7 ++
.../dynamic_service_stats | 9 +++
.../hardware_status | 5 ++
.../log.expect | 80 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 +++
.../static_service_stats | 9 +++
.../test-crs-dynamic-auto-rebalance4/README | 11 +++
.../test-crs-dynamic-auto-rebalance4/cmdlist | 13 +++
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 9 +++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++++
.../manager_status | 1 +
.../service_config | 9 +++
.../static_service_stats | 9 +++
45 files changed, 451 insertions(+)
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/README b/src/test/test-crs-dynamic-auto-rebalance0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
new file mode 100644
index 00000000..6526c203
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-threshold": 0.7
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/log.expect b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/log.expect
@@ -0,0 +1,11 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/manager_status b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/service_config b/src/test/test-crs-dynamic-auto-rebalance0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/README b/src/test/test-crs-dynamic-auto-rebalance1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/service_config b/src/test/test-crs-dynamic-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/README b/src/test/test-crs-dynamic-auto-rebalance2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/service_config b/src/test/test-crs-dynamic-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/README b/src/test/test-crs-dynamic-auto-rebalance3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/cmdlist
@@ -0,0 +1,16 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:101 set-dynamic-stats mem 1011",
+ "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+ "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+ "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+ "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+ "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+ "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+ "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
new file mode 100644
index 00000000..a07fe721
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/log.expect
@@ -0,0 +1,80 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 160 node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info 160 node1/crm: got crm command: migrate vm:105 node2
+info 160 node1/crm: migrate service 'vm:105' to node 'node2'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:105
+info 183 node2/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info 220 cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info 220 cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info 260 node1/crm: auto rebalance - migrate vm:103 to node1 (expected target imbalance: 0.4)
+info 260 node1/crm: got crm command: migrate vm:103 node1
+info 260 node1/crm: migrate service 'vm:103' to node 'node1'
+info 260 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 263 node2/lrm: service vm:103 - start migrate to node 'node1'
+info 263 node2/lrm: service vm:103 - end migrate to node 'node1'
+info 280 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node1)
+info 281 node1/lrm: starting service vm:103
+info 281 node1/lrm: service status vm:103 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/service_config b/src/test/test-crs-dynamic-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/README b/src/test/test-crs-dynamic-auto-rebalance4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/cmdlist
@@ -0,0 +1,13 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+ "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+ "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..14059a3e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-hold-duration": 6
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+ "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+ "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/service_config b/src/test/test-crs-dynamic-auto-rebalance4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance4/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 38/40] test: add static automatic rebalancing system test cases
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (36 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases are derivatives of the dynamic automatic rebalancing
system test cases 1 to 3, which ensure that the same basic functionality
is provided with the automatic rebalancing system with static usage
information.
The other dynamic usage test cases are not included here, because these
are invariant to the provided usage information and only test further
edge cases.
As an overview:
- Case 1: rebalancing system doesn't trigger any rebalancing migrations
for a single, configured HA resource
- Case 2: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance and converge if
the imbalance falls below the threshold
- Case 3: rebalancing system triggers migrations if the running HA
resources cause a significant node imbalance through changes
in their static usage
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
.../test-crs-static-auto-rebalance1/README | 7 ++
.../test-crs-static-auto-rebalance1/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../hardware_status | 5 ++
.../log.expect | 25 ++++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../test-crs-static-auto-rebalance2/README | 4 +
.../test-crs-static-auto-rebalance2/cmdlist | 3 +
.../datacenter.cfg | 7 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../test-crs-static-auto-rebalance3/README | 3 +
.../test-crs-static-auto-rebalance3/cmdlist | 15 ++++
.../datacenter.cfg | 7 ++
.../hardware_status | 5 ++
.../log.expect | 79 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 +++
.../static_service_stats | 9 +++
24 files changed, 273 insertions(+)
create mode 100644 src/test/test-crs-static-auto-rebalance1/README
create mode 100644 src/test/test-crs-static-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-static-auto-rebalance2/README
create mode 100644 src/test/test-crs-static-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-static-auto-rebalance3/README
create mode 100644 src/test/test-crs-static-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-static-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-static-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-static-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-static-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-static-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-static-auto-rebalance3/static_service_stats
diff --git a/src/test/test-crs-static-auto-rebalance1/README b/src/test/test-crs-static-auto-rebalance1/README
new file mode 100644
index 00000000..8f97ac55
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with static usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-static-auto-rebalance1/cmdlist b/src/test/test-crs-static-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance1/datacenter.cfg b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance1/hardware_status b/src/test/test-crs-static-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/log.expect b/src/test/test-crs-static-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d2c27bec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance1/manager_status b/src/test/test-crs-static-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance1/service_config b/src/test/test-crs-static-auto-rebalance1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance1/static_service_stats b/src/test/test-crs-static-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/README b/src/test/test-crs-static-auto-rebalance2/README
new file mode 100644
index 00000000..1d1b9d6e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-static-auto-rebalance2/cmdlist b/src/test/test-crs-static-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance2/datacenter.cfg b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance2/hardware_status b/src/test/test-crs-static-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/log.expect b/src/test/test-crs-static-auto-rebalance2/log.expect
new file mode 100644
index 00000000..3df96d83
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance2/manager_status b/src/test/test-crs-static-auto-rebalance2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-auto-rebalance2/service_config b/src/test/test-crs-static-auto-rebalance2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance2/static_service_stats b/src/test/test-crs-static-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/README b/src/test/test-crs-static-auto-rebalance3/README
new file mode 100644
index 00000000..2f57dac2
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/README
@@ -0,0 +1,3 @@
+Test that the auto rebalance system with static usage information will auto
+rebalance multiple running HA resources, where the static usage stats of some
+HA resources change over time, to reach minimum cluster node imbalance.
diff --git a/src/test/test-crs-static-auto-rebalance3/cmdlist b/src/test/test-crs-static-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..f18798b0
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/cmdlist
@@ -0,0 +1,15 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:106 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:107 set-static-stats maxcpu 8.0 maxmem 8192"
+ ],
+ [
+ "service vm:101 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:102 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:103 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:104 set-static-stats maxcpu 1.0 maxmem 1024",
+ "service vm:105 set-static-stats maxcpu 1.0 maxmem 1024"
+ ]
+]
diff --git a/src/test/test-crs-static-auto-rebalance3/datacenter.cfg b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..efd8e67a
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "static",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-static-auto-rebalance3/hardware_status b/src/test/test-crs-static-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/log.expect b/src/test/test-crs-static-auto-rebalance3/log.expect
new file mode 100644
index 00000000..ddb4e5ec
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/log.expect
@@ -0,0 +1,79 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:106 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:107 set-static-stats maxcpu 8.0 maxmem 8192
+info 160 node1/crm: auto rebalance - migrate vm:105 to node1 (expected target imbalance: 0.47)
+info 160 node1/crm: got crm command: migrate vm:105 node1
+info 160 node1/crm: migrate service 'vm:105' to node 'node1'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node1'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node1'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node1)
+info 181 node1/lrm: starting service vm:105
+info 181 node1/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:102 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:103 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:104 set-static-stats maxcpu 1.0 maxmem 1024
+info 220 cmdlist: execute service vm:105 set-static-stats maxcpu 1.0 maxmem 1024
+info 260 node1/crm: auto rebalance - migrate vm:106 to node2 (expected target imbalance: 0.42)
+info 260 node1/crm: got crm command: migrate vm:106 node2
+info 260 node1/crm: migrate service 'vm:106' to node 'node2'
+info 260 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 265 node3/lrm: service vm:106 - start migrate to node 'node2'
+info 265 node3/lrm: service vm:106 - end migrate to node 'node2'
+info 280 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node2)
+info 283 node2/lrm: starting service vm:106
+info 283 node2/lrm: service status vm:106 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-static-auto-rebalance3/manager_status b/src/test/test-crs-static-auto-rebalance3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-auto-rebalance3/service_config b/src/test/test-crs-static-auto-rebalance3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-static-auto-rebalance3/static_service_stats b/src/test/test-crs-static-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..560a6fe8
--- /dev/null
+++ b/src/test/test-crs-static-auto-rebalance3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:105": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:106": { "maxcpu": 2.0, "maxmem": 2147483648 },
+ "vm:107": { "maxcpu": 2.0, "maxmem": 2147483648 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (37 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases are clones of the dynamic automatic rebalancing system
test cases 0 through 4, which ensure that the same basic functionality
is provided with the automatic rebalancing system using the TOPSIS
method.
The expected outputs are exactly the same, but for test case 3, which
changes the second migration from
vm:103 to node1 with an expected target imbalance of 0.40
to
vm:103 to node3 with an expected target imbalance of 0.43.
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
.../README | 2 +
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 1 +
.../hardware_status | 5 ++
.../log.expect | 11 +++
.../manager_status | 1 +
.../service_config | 1 +
.../static_service_stats | 1 +
.../README | 7 ++
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 3 +
.../hardware_status | 5 ++
.../log.expect | 25 ++++++
.../manager_status | 1 +
.../service_config | 3 +
.../static_service_stats | 3 +
.../README | 4 +
.../cmdlist | 3 +
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++++
.../manager_status | 1 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../README | 4 +
.../cmdlist | 16 ++++
.../datacenter.cfg | 8 ++
.../dynamic_service_stats | 9 +++
.../hardware_status | 5 ++
.../log.expect | 80 +++++++++++++++++++
.../manager_status | 1 +
.../service_config | 9 +++
.../static_service_stats | 9 +++
.../README | 11 +++
.../cmdlist | 13 +++
.../datacenter.cfg | 9 +++
.../dynamic_service_stats | 9 +++
.../hardware_status | 5 ++
.../log.expect | 59 ++++++++++++++
.../manager_status | 1 +
.../service_config | 9 +++
.../static_service_stats | 9 +++
45 files changed, 455 insertions(+)
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/README
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
create mode 100644 src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/README b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
new file mode 100644
index 00000000..2b349566
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/README
@@ -0,0 +1,2 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger if no HA resources are configured in a homogeneous node cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/dynamic_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
new file mode 100644
index 00000000..27eed635
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/log.expect
@@ -0,0 +1,11 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis0/static_service_stats
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/README b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
new file mode 100644
index 00000000..086bee20
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information does not
+trigger for a single running HA resource in a homogeneous cluster.
+
+Even though the single running HA resource will create a high node imbalance,
+which would trigger a reblancing migration, there is no such migration that can
+exceed the minimum imbalance improvement threshold, i.e., improve the imbalance
+enough.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
new file mode 100644
index 00000000..50dd4901
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/dynamic_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
new file mode 100644
index 00000000..7f97253b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 4, "maxmem": 17179869184 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
new file mode 100644
index 00000000..e6ee4402
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/log.expect
@@ -0,0 +1,25 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
new file mode 100644
index 00000000..a0ab66d2
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
new file mode 100644
index 00000000..e1bf0839
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/README b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
new file mode 100644
index 00000000..93b81081
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running, homogeneous HA resources on a single node to other
+cluster nodes to reach a minimum cluster node imbalance in the homogeneous
+cluster.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
new file mode 100644
index 00000000..f01fd768
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:102": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:103": { "cpu": 1.0, "mem": 4294967296 },
+ "vm:104": { "cpu": 1.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
new file mode 100644
index 00000000..ce8cf0eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 34359738368 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
new file mode 100644
index 00000000..c2bc6463
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 80 node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.94)
+info 80 node1/crm: got crm command: migrate vm:101 node2
+info 80 node1/crm: migrate service 'vm:101' to node 'node2'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 83 node2/lrm: got lock 'ha_agent_node2_lock'
+info 83 node2/lrm: status change wait_for_agent_lock => active
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 103 node2/lrm: starting service vm:101
+info 103 node2/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:102 to node3 (expected target imbalance: 0.35)
+info 160 node1/crm: got crm command: migrate vm:102 node3
+info 160 node1/crm: migrate service 'vm:102' to node 'node3'
+info 160 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 161 node1/lrm: service vm:102 - start migrate to node 'node3'
+info 161 node1/lrm: service vm:102 - end migrate to node 'node3'
+info 165 node3/lrm: got lock 'ha_agent_node3_lock'
+info 165 node3/lrm: status change wait_for_agent_lock => active
+info 180 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
+info 185 node3/lrm: starting service vm:102
+info 185 node3/lrm: service status vm:102 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
new file mode 100644
index 00000000..b5960cb1
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
new file mode 100644
index 00000000..6cf8c106
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis2/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 2.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 2.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/README b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
new file mode 100644
index 00000000..2b7aa8c6
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/README
@@ -0,0 +1,4 @@
+Test that the auto rebalance system with dynamic usage information will auto
+rebalance multiple running HA resources with different dynamic usages, where
+the dynamic usage stats of some HA resources change over time, to reach minimum
+cluster node imbalance.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
new file mode 100644
index 00000000..239bf871
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/cmdlist
@@ -0,0 +1,16 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:101 set-dynamic-stats mem 1011",
+ "service vm:103 set-dynamic-stats cpu 3.9 mem 6517",
+ "service vm:104 set-dynamic-stats cpu 6.7 mem 8001",
+ "service vm:105 set-dynamic-stats cpu 1.8 mem 1201",
+ "service vm:106 set-dynamic-stats cpu 2.1 mem 1211",
+ "service vm:107 set-dynamic-stats cpu 0.9 mem 1191"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
new file mode 100644
index 00000000..4bb7c02b
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis"
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+ "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+ "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
new file mode 100644
index 00000000..4aaddd39
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/log.expect
@@ -0,0 +1,80 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 160 node1/crm: auto rebalance - migrate vm:105 to node2 (expected target imbalance: 0.42)
+info 160 node1/crm: got crm command: migrate vm:105 node2
+info 160 node1/crm: migrate service 'vm:105' to node 'node2'
+info 160 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:105 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:105 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:105
+info 183 node2/lrm: service status vm:105 started
+info 220 cmdlist: execute service vm:101 set-dynamic-stats mem 1011
+info 220 cmdlist: execute service vm:103 set-dynamic-stats cpu 3.9 mem 6517
+info 220 cmdlist: execute service vm:104 set-dynamic-stats cpu 6.7 mem 8001
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 1.8 mem 1201
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.1 mem 1211
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 0.9 mem 1191
+info 260 node1/crm: auto rebalance - migrate vm:103 to node3 (expected target imbalance: 0.43)
+info 260 node1/crm: got crm command: migrate vm:103 node3
+info 260 node1/crm: migrate service 'vm:103' to node 'node3'
+info 260 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node2, target = node3)
+info 263 node2/lrm: service vm:103 - start migrate to node 'node3'
+info 263 node2/lrm: service vm:103 - end migrate to node 'node3'
+info 280 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 285 node3/lrm: starting service vm:103
+info 285 node3/lrm: service status vm:103 started
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis3/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/README b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
new file mode 100644
index 00000000..e23fcf8d
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/README
@@ -0,0 +1,11 @@
+Test that the auto rebalance system with dynamic usage information will not
+trigger any rebalancing migrations for running HA resources, which cause a
+transient spike in their dynamic usage that makes the nodes exceed the
+imbalance threshold, but falls below the threshold before the hold duration
+expires.
+
+This test relies on the fact that every command batch in the `cmdlist` file is
+issued every 5 HA rounds. Therefore, for a hold duration set to 6 HA rounds and
+setting the dynamic usage back to usage values so the current imbalance is
+lower than the threshold right after simulating the spike will undercut the
+hold duration by one HA round to prevent triggering a rebalancing migration.
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
new file mode 100644
index 00000000..e8f5a22f
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/cmdlist
@@ -0,0 +1,13 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:105 set-dynamic-stats cpu 7.8 mem 7912",
+ "service vm:106 set-dynamic-stats cpu 5.7 mem 8192",
+ "service vm:107 set-dynamic-stats cpu 6.0 mem 8011"
+ ],
+ [
+ "service vm:105 set-dynamic-stats cpu 3.0 mem 5192",
+ "service vm:106 set-dynamic-stats cpu 2.9 mem 2500",
+ "service vm:107 set-dynamic-stats cpu 2.1 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
new file mode 100644
index 00000000..0fb3fdc3
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-method": "topsis",
+ "ha-auto-rebalance-hold-duration": 6
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
new file mode 100644
index 00000000..77e72c16
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/dynamic_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 5444206592 },
+ "vm:102": { "cpu": 1.2, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.8, "mem": 5444206592 },
+ "vm:104": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:105": { "cpu": 3.0, "mem": 5444206592 },
+ "vm:106": { "cpu": 2.9, "mem": 2621440000 },
+ "vm:107": { "cpu": 2.1, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
new file mode 100644
index 00000000..4eb53bd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node2'
+info 20 node1/crm: adding new service 'vm:105' on node 'node3'
+info 20 node1/crm: adding new service 'vm:106' on node 'node3'
+info 20 node1/crm: adding new service 'vm:107' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:105': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:106': state changed from 'request_start' to 'started' (node = node3)
+info 20 node1/crm: service 'vm:107': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 23 node2/lrm: starting service vm:104
+info 23 node2/lrm: service status vm:104 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:105
+info 25 node3/lrm: service status vm:105 started
+info 25 node3/lrm: starting service vm:106
+info 25 node3/lrm: service status vm:106 started
+info 25 node3/lrm: starting service vm:107
+info 25 node3/lrm: service status vm:107 started
+info 120 cmdlist: execute service vm:105 set-dynamic-stats cpu 7.8 mem 7912
+info 120 cmdlist: execute service vm:106 set-dynamic-stats cpu 5.7 mem 8192
+info 120 cmdlist: execute service vm:107 set-dynamic-stats cpu 6.0 mem 8011
+info 220 cmdlist: execute service vm:105 set-dynamic-stats cpu 3.0 mem 5192
+info 220 cmdlist: execute service vm:106 set-dynamic-stats cpu 2.9 mem 2500
+info 220 cmdlist: execute service vm:107 set-dynamic-stats cpu 2.1 mem 4096
+info 820 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
new file mode 100644
index 00000000..a44ddd0e
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/service_config
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node2", "state": "started" },
+ "vm:105": { "node": "node3", "state": "started" },
+ "vm:106": { "node": "node3", "state": "started" },
+ "vm:107": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
new file mode 100644
index 00000000..7a52ea73
--- /dev/null
+++ b/src/test/test-crs-dynamic-auto-rebalance-topsis4/static_service_stats
@@ -0,0 +1,9 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 4.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:105": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:106": { "maxcpu": 6.0, "maxmem": 8589934592 },
+ "vm:107": { "maxcpu": 6.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread* [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
` (38 preceding siblings ...)
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
@ 2026-03-24 18:30 ` Daniel Kral
39 siblings, 0 replies; 64+ messages in thread
From: Daniel Kral @ 2026-03-24 18:30 UTC (permalink / raw)
To: pve-devel
These test cases document and verify some behaviors of the automatic
rebalancing system in combination with HA affinity rules.
All of these test cases use only the dynamic usage information and
bruteforce method as the waiting on ongoing migrations and candidate
generation are invariant to those parameters.
As an overview:
- Case 1: rebalancing system acknowledges node affinity rules
- Case 2: rebalancing system considers HA resources in strict positive
resource affinity rules as a single unit (a resource bundle)
and will not split them apart
- Case 3: rebalancing system will wait on the migration of a not-yet
enforced strict positive resource affinity rule, i.e., the
HA resources still need to migrate to their common node
- Case 4: rebalancing system will acknowledge strict negative resource
affinity rules, but will still try to minimize the node
imbalance as much as possible
Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes v1 -> v2:
- new!
.../README | 7 +++
.../cmdlist | 8 +++
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 5 ++
.../hardware_status | 5 ++
.../log.expect | 49 +++++++++++++++
.../manager_status | 1 +
.../rules_config | 4 ++
.../service_config | 5 ++
.../static_service_stats | 5 ++
.../README | 12 ++++
.../cmdlist | 8 +++
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 4 ++
.../hardware_status | 5 ++
.../log.expect | 53 +++++++++++++++++
.../manager_status | 1 +
.../rules_config | 3 +
.../service_config | 4 ++
.../static_service_stats | 4 ++
.../README | 14 +++++
.../cmdlist | 3 +
.../datacenter.cfg | 8 +++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 +++++++++++++++++++
.../manager_status | 31 ++++++++++
.../rules_config | 3 +
.../service_config | 6 ++
.../static_service_stats | 6 ++
.../README | 14 +++++
.../cmdlist | 3 +
.../datacenter.cfg | 7 +++
.../dynamic_service_stats | 6 ++
.../hardware_status | 5 ++
.../log.expect | 59 +++++++++++++++++++
.../manager_status | 1 +
.../rules_config | 7 +++
.../service_config | 6 ++
.../static_service_stats | 6 ++
40 files changed, 452 insertions(+)
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/README
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
create mode 100644 src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/README b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
new file mode 100644
index 00000000..8504755f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/README
@@ -0,0 +1,7 @@
+Test that the auto rebalance system with dynamic usage information will not
+auto rebalance running HA resources, which cause a node imbalance exceeding the
+threshold, because their HA node affinity rules require them to strictly be
+kept on specific nodes.
+
+As a sanity check, the added HA resource, which is not part of the node
+affinity rule, is rebalanced to another node to lower the imbalance.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
new file mode 100644
index 00000000..6ee04948
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:104 add node1 started 1",
+ "service vm:104 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:104 set-dynamic-stats cpu 4.0 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
new file mode 100644
index 00000000..02133ab0
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/dynamic_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+ "vm:103": { "cpu": 4.7, "mem": 5242880000 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
new file mode 100644
index 00000000..d0b2aee2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/log.expect
@@ -0,0 +1,49 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:104 add node1 started 1
+info 120 cmdlist: execute service vm:104 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:104 set-dynamic-stats cpu 4.0 mem 4096
+info 120 node1/crm: adding new service 'vm:104' on node 'node1'
+info 120 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node1)
+info 140 node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.98)
+info 140 node1/crm: got crm command: migrate vm:104 node2
+info 140 node1/crm: migrate service 'vm:104' to node 'node2'
+info 140 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 141 node1/lrm: service vm:104 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:104 - end migrate to node 'node2'
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 160 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 163 node2/lrm: starting service vm:104
+info 163 node2/lrm: service status vm:104 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
new file mode 100644
index 00000000..00f615e9
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/rules_config
@@ -0,0 +1,4 @@
+node-affinity: vm101-stays-on-node1
+ nodes node1
+ resources vm:101,vm:102,vm:103
+ strict 1
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
new file mode 100644
index 00000000..57e3579d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
new file mode 100644
index 00000000..b11cc5eb
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance1/static_service_stats
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/README b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
new file mode 100644
index 00000000..be072f6d
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/README
@@ -0,0 +1,12 @@
+Test that the auto rebalance system with dynamic usage information will
+consider running HA resources in strict positive resource affinity rules as
+bundles, which can only be moved to other nodes as a single unit.
+
+Therefore, even though the two initial HA resources would be split apart,
+because these cause a node imbalance in the cluster, the auto rebalance system
+does not issue a rebalancing migration, because they must stay together.
+
+As a sanity check, adding another HA resource, which is not part of the strict
+positive resource affinity rule, will cause a rebalancing migration: in this
+case the resource bundle itself, because the leading node 'vm:101' is
+alphabetically first.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
new file mode 100644
index 00000000..61373367
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/cmdlist
@@ -0,0 +1,8 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ],
+ [
+ "service vm:103 add node1 started 1",
+ "service vm:103 set-static-stats maxcpu 8.0 maxmem 8192",
+ "service vm:103 set-dynamic-stats cpu 4.0 mem 4096"
+ ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
new file mode 100644
index 00000000..4f81dfe2
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/dynamic_service_stats
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
new file mode 100644
index 00000000..48501321
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/log.expect
@@ -0,0 +1,53 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute service vm:103 add node1 started 1
+info 120 cmdlist: execute service vm:103 set-static-stats maxcpu 8.0 maxmem 8192
+info 120 cmdlist: execute service vm:103 set-dynamic-stats cpu 4.0 mem 4096
+info 120 node1/crm: adding new service 'vm:103' on node 'node1'
+info 120 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node1)
+info 140 node1/crm: auto rebalance - migrate vm:101 to node2 (expected target imbalance: 0.86)
+info 140 node1/crm: got crm command: migrate vm:101 node2
+info 140 node1/crm: crm command 'migrate vm:101 node2' - migrate service 'vm:102' to node 'node2' (service 'vm:102' in positive affinity with service 'vm:101')
+info 140 node1/crm: migrate service 'vm:101' to node 'node2'
+info 140 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 140 node1/crm: migrate service 'vm:102' to node 'node2'
+info 140 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 141 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 141 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 141 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 160 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 160 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 163 node2/lrm: starting service vm:101
+info 163 node2/lrm: service status vm:101 started
+info 163 node2/lrm: starting service vm:102
+info 163 node2/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
new file mode 100644
index 00000000..9e26dfee
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
new file mode 100644
index 00000000..e1948a00
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+ resources vm:101,vm:102
+ affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
new file mode 100644
index 00000000..880e0a59
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/service_config
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
new file mode 100644
index 00000000..455ae043
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance2/static_service_stats
@@ -0,0 +1,4 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/README b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
new file mode 100644
index 00000000..4b4d4855
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will wait on
+a resource motion being finished, because a strict positive resource affinity
+rule is not correctly enforced yet.
+
+This test case manipulates the manager status in such a way, so that the HA
+Manager will assume that the not-yet-migrated HA resource in the strict
+positive resource affinity rule is still migrating as currently the integration
+tests do not support prolonged migrations.
+
+Furthermore, auto rebalancing migrations are forced to be issued as soon as
+possible with the hold duration being set to 0. This ensures that if the auto
+rebalance system would not wait on the ongoing migration, the auto rebalancing
+migration would be done right away in the same round as the HA resources being
+acknowledged as running.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
new file mode 100644
index 00000000..181ea848
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/datacenter.cfg
@@ -0,0 +1,8 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1,
+ "ha-auto-rebalance-hold-duration": 0
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
new file mode 100644
index 00000000..d35a2c8f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 2621440000 },
+ "vm:102": { "cpu": 7.9, "mem": 8589934592 },
+ "vm:103": { "cpu": 4.7, "mem": 5242880000 },
+ "vm:104": { "cpu": 4.0, "mem": 4294967296 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
new file mode 100644
index 00000000..1242f827
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: service vm:101 - start migrate to node 'node1'
+info 23 node2/lrm: service vm:101 - end migrate to node 'node1'
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 41 node1/lrm: starting service vm:101
+info 41 node1/lrm: service status vm:101 started
+info 60 node1/crm: auto rebalance - migrate vm:102 to node2 (expected target imbalance: 0.72)
+info 60 node1/crm: got crm command: migrate vm:102 node2
+info 60 node1/crm: migrate service 'vm:102' to node 'node2'
+info 60 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 61 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 61 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 80 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 83 node2/lrm: starting service vm:102
+info 83 node2/lrm: service status vm:102 started
+info 100 node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.27)
+info 100 node1/crm: got crm command: migrate vm:101 node3
+info 100 node1/crm: crm command 'migrate vm:101 node3' - migrate service 'vm:103' to node 'node3' (service 'vm:103' in positive affinity with service 'vm:101')
+info 100 node1/crm: migrate service 'vm:101' to node 'node3'
+info 100 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 100 node1/crm: migrate service 'vm:103' to node 'node3'
+info 100 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 101 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 101 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 101 node1/lrm: service vm:103 - start migrate to node 'node3'
+info 101 node1/lrm: service vm:103 - end migrate to node 'node3'
+info 105 node3/lrm: got lock 'ha_agent_node3_lock'
+info 105 node3/lrm: status change wait_for_agent_lock => active
+info 120 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 120 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 125 node3/lrm: starting service vm:101
+info 125 node3/lrm: service status vm:101 started
+info 125 node3/lrm: starting service vm:103
+info 125 node3/lrm: service status vm:103 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
new file mode 100644
index 00000000..cf90037c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/manager_status
@@ -0,0 +1,31 @@
+{
+ "master_node": "node1",
+ "node_status": {
+ "node1":"online",
+ "node2":"online",
+ "node3":"online"
+ },
+ "service_status": {
+ "vm:101": {
+ "node": "node2",
+ "state": "migrate",
+ "target": "node1",
+ "uid": "RoPGTlvNYq/oZFokv9fgWw"
+ },
+ "vm:102": {
+ "node": "node1",
+ "state": "started",
+ "uid": "fR3i18EHk6DhF8Zd2jddNX"
+ },
+ "vm:103": {
+ "node": "node1",
+ "state": "started",
+ "uid": "JVDARwmsXoVTF8Zd0BY2Mg"
+ },
+ "vm:104": {
+ "node": "node1",
+ "state": "started",
+ "uid": "23hk23EHk6DhF8Zd0218DD"
+ }
+ }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
new file mode 100644
index 00000000..2c3f3171
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/rules_config
@@ -0,0 +1,3 @@
+resource-affinity: vms-stay-together
+ resources vm:101,vm:103
+ affinity positive
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
new file mode 100644
index 00000000..3dadaabc
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node2", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node1", "state": "started" },
+ "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance3/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/README b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
new file mode 100644
index 00000000..e304cc22
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/README
@@ -0,0 +1,14 @@
+Test that the auto rebalance system with dynamic usage information will not
+rebalance a HA resource on the same node as another HA resource, which are in a
+strict negative resource affinity rule.
+
+There is a high node imbalance since vm:101 and vm:102 on node1 cause a higher
+usage than node2 and node3 have. Even though it would be ideal to move one of
+these to node2, because it has a very low usage, these cannot be moved there as
+both vm:101 and vm:102 are in a strict negative resource affinity rule with a
+HA resource on node2 respectively.
+
+To minimize the imbalance in the cluster, one of the HA resources from node1 is
+migrated to node3 first, and afterwards the HA resource on node3, which is not
+in a strict negative resource affinity rule with a HA resource on node2, will
+be migrated to node2.
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
new file mode 100644
index 00000000..13f90cd7
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/cmdlist
@@ -0,0 +1,3 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on" ]
+]
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
new file mode 100644
index 00000000..147bd61a
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+ "crs": {
+ "ha": "dynamic",
+ "ha-auto-rebalance": 1
+ }
+}
+
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
new file mode 100644
index 00000000..083f338b
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/dynamic_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "cpu": 0.9, "mem": 4294967296 },
+ "vm:102": { "cpu": 2.4, "mem": 2621440000 },
+ "vm:103": { "cpu": 0.0, "mem": 0 },
+ "vm:104": { "cpu": 1.0, "mem": 1073741824 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
new file mode 100644
index 00000000..8f1e695c
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node2": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 },
+ "node3": { "power": "off", "network": "off", "maxcpu": 24, "maxmem": 51539607552 }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
new file mode 100644
index 00000000..58f1b481
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/log.expect
@@ -0,0 +1,59 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'dynamic'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node2'
+info 20 node1/crm: adding new service 'vm:104' on node 'node3'
+info 20 node1/crm: service 'vm:101': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:102': state changed from 'request_start' to 'started' (node = node1)
+info 20 node1/crm: service 'vm:103': state changed from 'request_start' to 'started' (node = node2)
+info 20 node1/crm: service 'vm:104': state changed from 'request_start' to 'started' (node = node3)
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service vm:103
+info 23 node2/lrm: service status vm:103 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:104
+info 25 node3/lrm: service status vm:104 started
+info 80 node1/crm: auto rebalance - migrate vm:101 to node3 (expected target imbalance: 0.72)
+info 80 node1/crm: got crm command: migrate vm:101 node3
+info 80 node1/crm: migrate service 'vm:101' to node 'node3'
+info 80 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 81 node1/lrm: service vm:101 - start migrate to node 'node3'
+info 81 node1/lrm: service vm:101 - end migrate to node 'node3'
+info 100 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
+info 105 node3/lrm: starting service vm:101
+info 105 node3/lrm: service status vm:101 started
+info 160 node1/crm: auto rebalance - migrate vm:104 to node2 (expected target imbalance: 0.33)
+info 160 node1/crm: got crm command: migrate vm:104 node2
+info 160 node1/crm: migrate service 'vm:104' to node 'node2'
+info 160 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node3, target = node2)
+info 165 node3/lrm: service vm:104 - start migrate to node 'node2'
+info 165 node3/lrm: service vm:104 - end migrate to node 'node2'
+info 180 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 183 node2/lrm: starting service vm:104
+info 183 node2/lrm: service status vm:104 started
+info 620 hardware: exit simulation - done
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
new file mode 100644
index 00000000..0967ef42
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
new file mode 100644
index 00000000..eef5460f
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/rules_config
@@ -0,0 +1,7 @@
+resource-affinity: vms-stay-apart1
+ resources vm:101,vm:103
+ affinity negative
+
+resource-affinity: vms-stay-apart2
+ resources vm:102,vm:103
+ affinity negative
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
new file mode 100644
index 00000000..16bffacf
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/service_config
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "node": "node1", "state": "started" },
+ "vm:102": { "node": "node1", "state": "started" },
+ "vm:103": { "node": "node2", "state": "started" },
+ "vm:104": { "node": "node3", "state": "started" }
+}
diff --git a/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
new file mode 100644
index 00000000..ff1e50f8
--- /dev/null
+++ b/src/test/test-crs-dynamic-constrained-auto-rebalance4/static_service_stats
@@ -0,0 +1,6 @@
+{
+ "vm:101": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:102": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:103": { "maxcpu": 8.0, "maxmem": 8589934592 },
+ "vm:104": { "maxcpu": 8.0, "maxmem": 8589934592 }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 64+ messages in thread