public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager
@ 2022-11-10 14:37 Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
                   ` (21 more replies)
  0 siblings, 22 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.

This also includes the remaining cgroup/cpuunits-related patches,
because the broadcasting of static information was done to include the
cgroup mode of the node.

With this version, the effect is limited to choosing nodes during
recovery, but the plan is to extend this.

As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.

An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
   chosen node after each iteration. The current node can be kept if the
   score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.

Still missing are also unit tests for ha-manager itself.


Almost all of the series is preparatory infrastructure, but the hope
is that much of it can be re-used for balancers and dynamic
scheduling in the future.

The proxmox-resource-scheduling Rust crate implements the TOPSIS
algorithm first suggested by Alexandre. It also models the static node
and service usages in PVE and allows to score nodes where to start
new or recovered service. This is done by simulating starting it on
each node and comparing the alternatives with average and highest CPU
and memory as criteria. Memory being weighted much more as it is a
more limited resource than CPU.

I did not implement the criteria weighing process from AHP (yet) (also
suggested by Alexandre) which computes avaraged weights and a bias
score from a table of pairwise weights between criteria. The downside
is that one needs to guess n(n-1)/2 weights instead of n, and the
upside is that it has to be done only pairwise rather than relative to
all others. But this still can be done in the future if we want.

In proxmox-perl-rs, a class is provided for interfacing from Perl.

In pve-manager, the static node information is broadcast whenever
outdated. There also are the unrelated (but touching the same code)
cgroup/cpuunits patches.

In pve-cluster, a new crs (=cluster-resource-scheduler) option is
added, initially with a mode for HA.

In pve-ha-manager, the online node usage calculation is factored out
into a 'Usage' plugin system to ease adding the new static mode
without much cluttering. If not all nodes provide static service
information, we fall back to the 'basic' mode. If only the scoring
fails (but that /should/ be rather unlikely), there is no real
fallback implemented currently (the '|| $a cmp $b' in
select_service_node() destroys the random hash keys order again ;)).
We could change it to stay random or better, track the service count
in Usage::Static too and use that.


Dependency bumps needed:
proxmox-perl-rs depends on proxmox-resource-scheduling
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.


proxmox-resource-scheduling:

Fiona Ebner (3):
  initial commit
  add pve_static module
  add Debian packaging


proxmox-perl-rs:

Fiona Ebner (2):
  pve-rs: add resource scheduling module
  add basic test for resource scheduling

 Makefile                                 |   1 +
 pve-rs/Cargo.toml                        |   1 +
 pve-rs/src/lib.rs                        |   1 +
 pve-rs/src/resource_scheduling/mod.rs    |   1 +
 pve-rs/src/resource_scheduling/static.rs | 116 +++++++++++++++++++++++
 pve-rs/test/Makefile                     |   4 +
 pve-rs/test/README                       |   2 +
 pve-rs/test/resource_scheduling.pl       |  70 ++++++++++++++
 8 files changed, 196 insertions(+)
 create mode 100644 pve-rs/src/resource_scheduling/mod.rs
 create mode 100644 pve-rs/src/resource_scheduling/static.rs
 create mode 100644 pve-rs/test/Makefile
 create mode 100644 pve-rs/test/README
 create mode 100755 pve-rs/test/resource_scheduling.pl


pve-manager:

Fiona Ebner (3):
  pvestatd: broadcast static node information
  cluster resources: add cgroup-mode to node properties
  ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version

 PVE/API2/Cluster.pm                | 13 +++++++++++++
 PVE/Service/pvestatd.pm            | 25 ++++++++++++++++++++++++
 www/manager6/lxc/CreateWizard.js   |  8 ++++++++
 www/manager6/lxc/ResourceEdit.js   | 31 +++++++++++++++++++++++++-----
 www/manager6/lxc/Resources.js      |  8 +++++++-
 www/manager6/qemu/CreateWizard.js  |  8 ++++++++
 www/manager6/qemu/HardwareView.js  |  8 +++++++-
 www/manager6/qemu/ProcessorEdit.js | 31 +++++++++++++++++++++++-------
 8 files changed, 118 insertions(+), 14 deletions(-)


pve-cluster:

Fiona Ebner (1):
  datacenter config: add cluster resource scheduling (crs) options

 data/PVE/DataCenterConfig.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)


pve-ha-manager:

Fiona Ebner (11):
  env: add get_static_node_stats() method
  resources: add get_static_stats() method
  add Usage base plugin and Usage::Basic plugin
  manager: select service node: add $sid to parameters
  manager: online node usage: switch to Usage::Basic plugin
  usage: add Usage::Static plugin
  env: add get_crs_settings() method
  manager: set resource scheduler mode upon init
  manager: use static resource scheduler when configured
  manager: avoid scoring nodes if maintenance fallback node is valid
  manager: avoid scoring nodes when not trying next and current node is
    valid

 debian/pve-ha-manager.install |   3 +
 src/PVE/HA/Env.pm             |  13 ++++
 src/PVE/HA/Env/PVE2.pm        |  29 +++++++++
 src/PVE/HA/Makefile           |   3 +-
 src/PVE/HA/Manager.pm         |  77 ++++++++++++++---------
 src/PVE/HA/Resources.pm       |   5 ++
 src/PVE/HA/Resources/PVECT.pm |  11 ++++
 src/PVE/HA/Resources/PVEVM.pm |  14 +++++
 src/PVE/HA/Sim/Env.pm         |   9 +++
 src/PVE/HA/Sim/TestEnv.pm     |   6 ++
 src/PVE/HA/Usage.pm           |  50 +++++++++++++++
 src/PVE/HA/Usage/Basic.pm     |  52 ++++++++++++++++
 src/PVE/HA/Usage/Makefile     |   6 ++
 src/PVE/HA/Usage/Static.pm    | 114 ++++++++++++++++++++++++++++++++++
 src/test/test_failover1.pl    |  21 ++++---
 15 files changed, 374 insertions(+), 39 deletions(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile
 create mode 100644 src/PVE/HA/Usage/Static.pm


pve-docs:

Fiona Ebner (1):
  ha: add section about scheduler modes

 ha-manager.adoc | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-15 10:15   ` [pve-devel] applied: " Wolfgang Bumiller
  2022-11-15 15:39   ` [pve-devel] " DERUMIER, Alexandre
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module Fiona Ebner
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Implement the TOPSIS[0] algorithm to score multi-valued alternatives
according to a given set of weighted criteria.

The number of alternatives cannot be known at compile time, but the
number of criteria should be (a given module using the topsis module
should have one (or more) fixed sets of criteria). Therefore, the
TopsisMatrix is implemented as a Vec of N_CRITERIA-sized arrays.

Compared to the description in [0] the weighing of the matrix
according to the weights of the criteria only happens during distance
calculation to the idealized alternatives. It turned out more natural
like that, because the matrix doesn't need to be mutable.

[0]: https://en.wikipedia.org/wiki/TOPSIS

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 .cargo/config   |   5 +
 .gitignore      |   2 +
 Cargo.toml      |  20 ++++
 rustfmt.toml    |   1 +
 src/lib.rs      |   1 +
 src/topsis.rs   | 216 +++++++++++++++++++++++++++++++++++
 tests/topsis.rs | 293 ++++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 538 insertions(+)
 create mode 100644 .cargo/config
 create mode 100644 .gitignore
 create mode 100644 Cargo.toml
 create mode 100644 rustfmt.toml
 create mode 100644 src/lib.rs
 create mode 100644 src/topsis.rs
 create mode 100644 tests/topsis.rs

diff --git a/.cargo/config b/.cargo/config
new file mode 100644
index 0000000..3b5b6e4
--- /dev/null
+++ b/.cargo/config
@@ -0,0 +1,5 @@
+[source]
+[source.debian-packages]
+directory = "/usr/share/cargo/registry"
+[source.crates-io]
+replace-with = "debian-packages"
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..1e7caa9
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+Cargo.lock
+target/
diff --git a/Cargo.toml b/Cargo.toml
new file mode 100644
index 0000000..ec8e12f
--- /dev/null
+++ b/Cargo.toml
@@ -0,0 +1,20 @@
+[package]
+name = "proxmox-resource-scheduling"
+version = "0.1.0"
+authors = [
+    "Fiona Ebner <f.ebner@proxmox.com>",
+    "Proxmox Support Team <support@proxmox.com>",
+]
+edition = "2021"
+license = "AGPL-3"
+description = "Proxmox library for resource scheduling"
+homepage = "https://www.proxmox.com"
+
+exclude = [ "debian" ]
+
+[lib]
+name = "proxmox_resource_scheduling"
+path = "src/lib.rs"
+
+[dependencies]
+anyhow = "1.0"
diff --git a/rustfmt.toml b/rustfmt.toml
new file mode 100644
index 0000000..3a26366
--- /dev/null
+++ b/rustfmt.toml
@@ -0,0 +1 @@
+edition = "2021"
diff --git a/src/lib.rs b/src/lib.rs
new file mode 100644
index 0000000..dda0563
--- /dev/null
+++ b/src/lib.rs
@@ -0,0 +1 @@
+pub mod topsis;
diff --git a/src/topsis.rs b/src/topsis.rs
new file mode 100644
index 0000000..f932bcd
--- /dev/null
+++ b/src/topsis.rs
@@ -0,0 +1,216 @@
+use anyhow::{bail, Error};
+
+fn differences<const N: usize>(xs: &[f64; N], ys: &[f64; N]) -> [f64; N] {
+    let mut differences = [0.0; N];
+    for n in 0..N {
+        differences[n] = xs[n] - ys[n];
+    }
+    differences
+}
+
+/// Calculate the L^2-norm of the given values.
+fn l2_norm(values: &[f64]) -> f64 {
+    values
+        .iter()
+        .fold(0.0, |sum_of_squares, value| sum_of_squares + value * value)
+        .sqrt()
+}
+
+/// A criterion that can be used when scoring with the TOPSIS algorithm.
+pub struct TopsisCriterion {
+    /// Name of the criterion.
+    name: String,
+    /// The relative weight of the criterion. Is non-negative.
+    weight: f64,
+    /// Whether it's good to maximize or minimize the criterion.
+    maximize: bool,
+}
+
+impl TopsisCriterion {
+    /// Construct a new `TopsisCriterion`. Use a negative weight if the value for the criterion
+    /// should be minimized rather than maximized.
+    ///
+    /// Assumes that `weight` is finite.
+    pub fn new(name: String, weight: f64) -> Self {
+        let (maximize, weight) = if weight >= 0.0 {
+            (true, weight)
+        } else {
+            (false, -weight)
+        };
+
+        TopsisCriterion {
+            name,
+            weight,
+            maximize,
+        }
+    }
+}
+
+/// A normalized array of `TopsisCriterion`.
+pub struct TopsisCriteria<const N_CRITERIA: usize>([TopsisCriterion; N_CRITERIA]);
+
+/// A normalized matrix used for scoring with the TOPSIS algorithm.
+pub struct TopsisMatrix<const N_CRITERIA: usize>(Vec<[f64; N_CRITERIA]>);
+
+/// Idealized alternatives from a `TopsisMatrix`. That is, the alternative consisting of the best
+/// (respectively worst) value among the alternatives in the matrix for each single criterion.
+struct TopsisIdealAlternatives<const N_CRITERIA: usize> {
+    best: [f64; N_CRITERIA],
+    worst: [f64; N_CRITERIA],
+}
+
+impl<const N: usize> TopsisCriteria<N> {
+    /// Create a new instance of normalized TOPSIS criteria.
+    ///
+    /// Assumes that the sum of weights can be calculated to a finite, non-zero value.
+    pub fn new(mut criteria: [TopsisCriterion; N]) -> Result<Self, Error> {
+        let divisor = criteria
+            .iter()
+            .fold(0.0, |sum, criterion| sum + criterion.weight);
+
+        if divisor == 0.0 {
+            bail!("no criterion has a non-zero weight");
+        }
+
+        for criterion in criteria.iter_mut() {
+            criterion.weight /= divisor;
+            if criterion.weight > 1.0 || criterion.weight < 0.0 || !criterion.weight.is_finite() {
+                bail!(
+                    "criterion '{}' got invalid weight {}",
+                    criterion.name,
+                    criterion.weight
+                );
+            }
+        }
+
+        Ok(TopsisCriteria(criteria))
+    }
+
+    /// Weigh each value according to the weight of its corresponding criterion.
+    pub fn weigh<'a>(&self, values: &'a mut [f64; N]) -> &'a [f64; N] {
+        for (n, value) in values.iter_mut().enumerate() {
+            *value *= self.0[n].weight
+        }
+        values
+    }
+}
+
+impl<const N: usize> TopsisMatrix<N> {
+    /// Values of the matrix for the fixed critierion with index `index`.
+    fn fixed_criterion(&self, index: usize) -> Vec<f64> {
+        self.0
+            .iter()
+            .map(|alternative| alternative[index])
+            .collect::<Vec<_>>()
+    }
+
+    /// Mutable values of the matrix for the fixed critierion with index `index`.
+    fn fixed_criterion_mut(&mut self, index: usize) -> Vec<&mut f64> {
+        self.0
+            .iter_mut()
+            .map(|alternative| &mut alternative[index])
+            .collect::<Vec<&mut _>>()
+    }
+
+    /// Create a normalized `TopsisMatrix` based on the given values.
+    ///
+    /// Assumes that the sum of squares for each fixed criterion in `matrix` can be calculated to a
+    /// finite value.
+    pub fn new(matrix: Vec<[f64; N]>) -> Result<Self, Error> {
+        let mut matrix = TopsisMatrix(matrix);
+        for n in 0..N {
+            let divisor = l2_norm(&matrix.fixed_criterion(n));
+
+            // If every alternative has zero value for the given criterion, keep it like that.
+            if divisor != 0.0 {
+                for value in matrix.fixed_criterion_mut(n).into_iter() {
+                    *value /= divisor;
+                    if !value.is_finite() {
+                        bail!("criterion {} got invalid value {}", n, value);
+                    }
+                }
+            }
+        }
+
+        Ok(matrix)
+    }
+}
+
+/// Compute the idealized alternatives from the given `matrix`. The `criteria` are required to know
+/// if a critierion should be maximized or minimized.
+fn ideal_alternatives<const N: usize>(
+    matrix: &TopsisMatrix<N>,
+    criteria: &TopsisCriteria<N>,
+) -> TopsisIdealAlternatives<N> {
+    let criteria = &criteria.0;
+
+    let mut best = [0.0; N];
+    let mut worst = [0.0; N];
+
+    for n in 0..N {
+        let fixed_criterion = matrix.fixed_criterion(n);
+        let min = fixed_criterion
+            .iter()
+            .min_by(|a, b| a.total_cmp(b))
+            .unwrap();
+        let max = fixed_criterion
+            .iter()
+            .max_by(|a, b| a.total_cmp(b))
+            .unwrap();
+
+        (best[n], worst[n]) = match criteria[n].maximize {
+            true => (*max, *min),
+            false => (*min, *max),
+        }
+    }
+
+    TopsisIdealAlternatives { best, worst }
+}
+
+/// Scores the alternatives in `matrix` according to their similarity to the ideal worst
+/// alternative. Scores range from 0.0 to 1.0 and a low score indicates closness to the ideal worst
+/// and/or distance to the ideal best alternatives.
+pub fn score_alternatives<const N: usize>(
+    matrix: &TopsisMatrix<N>,
+    criteria: &TopsisCriteria<N>,
+) -> Result<Vec<f64>, Error> {
+    let ideal_alternatives = ideal_alternatives(matrix, criteria);
+    let ideal_best = &ideal_alternatives.best;
+    let ideal_worst = &ideal_alternatives.worst;
+
+    let mut scores = vec![];
+
+    for alternative in matrix.0.iter() {
+        let distance_to_best = l2_norm(criteria.weigh(&mut differences(alternative, ideal_best)));
+        let distance_to_worst = l2_norm(criteria.weigh(&mut differences(alternative, ideal_worst)));
+
+        let divisor = distance_to_worst + distance_to_best;
+        if divisor == 0.0 {
+            // Can happen if all alternatives are the same.
+            scores.push(0.0);
+        } else {
+            scores.push(distance_to_worst / divisor);
+        }
+    }
+
+    if let Some(score) = scores.iter().find(|score| !score.is_finite()) {
+        bail!("invalid score {}", score);
+    }
+
+    Ok(scores)
+}
+
+/// Similar to `score_alternatives`, but returns the list of indices decreasing by score.
+pub fn rank_alternatives<const N: usize>(
+    matrix: &TopsisMatrix<N>,
+    criteria: &TopsisCriteria<N>,
+) -> Result<Vec<usize>, Error> {
+    let scores = score_alternatives(matrix, criteria)?;
+
+    let mut enumerated = scores
+        .into_iter()
+        .enumerate()
+        .collect::<Vec<(usize, f64)>>();
+    enumerated.sort_by(|(_, a), (_, b)| b.total_cmp(a));
+    Ok(enumerated.into_iter().map(|(n, _)| n).collect())
+}
diff --git a/tests/topsis.rs b/tests/topsis.rs
new file mode 100644
index 0000000..bfb8624
--- /dev/null
+++ b/tests/topsis.rs
@@ -0,0 +1,293 @@
+use anyhow::Error;
+
+use proxmox_resource_scheduling::topsis::{
+    rank_alternatives, TopsisCriteria, TopsisCriterion, TopsisMatrix,
+};
+
+#[test]
+fn test_topsis_single_criterion() -> Result<(), Error> {
+    let criteria_pos =
+        TopsisCriteria::new([TopsisCriterion::new("the one and only".to_string(), 1.0)])?;
+
+    let criteria_neg =
+        TopsisCriteria::new([TopsisCriterion::new("the one and only".to_string(), -1.0)])?;
+
+    let matrix = vec![[0.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_pos)?,
+        vec![0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_neg)?,
+        vec![0],
+    );
+
+    let matrix = vec![[0.0], [2.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_pos)?,
+        vec![1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_neg)?,
+        vec![0, 1],
+    );
+
+    let matrix = vec![[1.0], [2.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_pos)?,
+        vec![1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_neg)?,
+        vec![0, 1],
+    );
+
+    let matrix = vec![[1.0], [2.0], [5.0], [3.0], [4.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_pos)?,
+        vec![2, 4, 3, 1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_neg)?,
+        vec![0, 1, 3, 4, 2],
+    );
+
+    let matrix = vec![[-2.0], [-5.0], [-4.0], [1.0], [-3.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_pos)?,
+        vec![3, 0, 4, 2, 1],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_neg)?,
+        vec![1, 2, 4, 0, 3],
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_topsis_two_criteria() -> Result<(), Error> {
+    let criteria_max_min = TopsisCriteria::new([
+        TopsisCriterion::new("max".to_string(), 1.0),
+        TopsisCriterion::new("min".to_string(), -1.0),
+    ])?;
+    let criteria_min_max = TopsisCriteria::new([
+        TopsisCriterion::new("min".to_string(), -1.0),
+        TopsisCriterion::new("max".to_string(), 1.0),
+    ])?;
+    let criteria_min_min = TopsisCriteria::new([
+        TopsisCriterion::new("min1".to_string(), -1.0),
+        TopsisCriterion::new("min2".to_string(), -1.0),
+    ])?;
+    let criteria_max_max = TopsisCriteria::new([
+        TopsisCriterion::new("max1".to_string(), 1.0),
+        TopsisCriterion::new("max2".to_string(), 1.0),
+    ])?;
+
+    let matrix = vec![[0.0, 0.0]];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_max_min)?,
+        vec![0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_max)?,
+        vec![0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_min)?,
+        vec![0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_max_max)?,
+        vec![0],
+    );
+
+    #[rustfmt::skip]
+    let matrix = vec![
+        [0.0, 1.0],
+        [1.0, 0.0],
+        [1.0, -1.0],
+        [1.0, 1.0],
+        [0.0, 0.0],
+    ];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_max_min)?,
+        vec![2, 1, 4, 3, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_max)?,
+        vec![0, 3, 4, 1, 2],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_min)?,
+        vec![2, 4, 1, 0, 3],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_max_max)?,
+        vec![3, 0, 1, 4, 2],
+    );
+
+    // This one was cross-checked with https://decision-radar.com/topsis rather than manually.
+    #[rustfmt::skip]
+    let matrix = vec![
+        [7.0, 4.0],
+        [1.0, 0.5],
+        [-1.0, -1.0],
+        [-8.5, 11.5],
+        [4.0, 7.0],
+    ];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_max_min)?,
+        vec![0, 1, 4, 2, 3],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_max)?,
+        vec![3, 2, 4, 1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_min_min)?,
+        vec![2, 3, 1, 0, 4],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_max_max)?,
+        vec![4, 0, 1, 3, 2],
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_topsis_three_criteria() -> Result<(), Error> {
+    let criteria_first = TopsisCriteria::new([
+        TopsisCriterion::new("more".to_string(), 10.0),
+        TopsisCriterion::new("less".to_string(), 0.2),
+        TopsisCriterion::new("least".to_string(), 0.1),
+    ])?;
+    let criteria_second = TopsisCriteria::new([
+        TopsisCriterion::new("less".to_string(), 0.2),
+        TopsisCriterion::new("more".to_string(), 10.0),
+        TopsisCriterion::new("least".to_string(), 0.1),
+    ])?;
+    let criteria_third = TopsisCriteria::new([
+        TopsisCriterion::new("less".to_string(), 0.2),
+        TopsisCriterion::new("least".to_string(), 0.1),
+        TopsisCriterion::new("more".to_string(), 10.0),
+    ])?;
+    let criteria_min = TopsisCriteria::new([
+        TopsisCriterion::new("most".to_string(), -10.0),
+        TopsisCriterion::new("more".to_string(), -5.0),
+        TopsisCriterion::new("less".to_string(), 0.1),
+    ])?;
+
+    #[rustfmt::skip]
+    let matrix = vec![
+        [1.0, 0.0, 0.0],
+        [0.0, 1.0, 0.0],
+        [0.0, 0.0, 1.0],
+    ];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_first)?,
+        vec![0, 1, 2],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_second)?,
+        vec![1, 0, 2],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_third)?,
+        vec![2, 0, 1],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_min)?,
+        vec![2, 1, 0],
+    );
+
+    #[rustfmt::skip]
+    let matrix = vec![
+        [1.0, 0.0, 0.0],
+        [0.0, 1.0, 0.0],
+        [0.0, 0.0, 1000.0],
+    ];
+    // normalization ensures that it's still the same as above
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_first)?,
+        vec![0, 1, 2],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_second)?,
+        vec![1, 0, 2],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_third)?,
+        vec![2, 0, 1],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_min)?,
+        vec![2, 1, 0],
+    );
+
+    #[rustfmt::skip]
+    let matrix = vec![
+        [-1.0, 0.0, 0.0],
+        [0.0, -1.0, 0.0],
+        [0.0, 0.0, 1.0],
+    ];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_first)?,
+        vec![2, 1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_second)?,
+        vec![2, 0, 1],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix.clone())?, &criteria_third)?,
+        vec![2, 1, 0],
+    );
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_min)?,
+        vec![0, 1, 2],
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_nan() {
+    #[rustfmt::skip]
+    let matrix = vec![
+        [-1.0, 0.0, 0.0],
+        [0.0, -1.0, 0.0],
+        [0.0, f64::NAN, 1.0],
+    ];
+    assert!(TopsisMatrix::new(matrix).is_err());
+}
+
+#[test]
+fn test_zero() -> Result<(), Error> {
+    let criteria_zero = TopsisCriteria::new([
+        TopsisCriterion::new("z".to_string(), 0.0),
+        TopsisCriterion::new("e".to_string(), 0.0),
+        TopsisCriterion::new("ro".to_string(), 0.0),
+    ]);
+    assert!(criteria_zero.is_err());
+
+    let criteria_first = TopsisCriteria::new([
+        TopsisCriterion::new("more".to_string(), 10.0),
+        TopsisCriterion::new("less".to_string(), 0.2),
+        TopsisCriterion::new("least".to_string(), 0.1),
+    ])?;
+
+    #[rustfmt::skip]
+    let matrix = vec![
+        [0.0, 0.0, 0.0],
+        [0.0, 1.0, 0.0],
+        [0.0, 0.0, 1.0],
+    ];
+    assert_eq!(
+        rank_alternatives(&TopsisMatrix::new(matrix)?, &criteria_first)?,
+        vec![1, 2, 0],
+    );
+
+    Ok(())
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-16  9:18   ` Thomas Lamprecht
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 3/3] add Debian packaging Fiona Ebner
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Models usage of guests and nodes, and allows scoring nodes on which to
start a new service via TOPSIS. For this scoring, each node in turn is
considered as if the service was already running on it.

CPU and memory usage are used as criteria, with memory being weighted
much more, because it's a truly limited resource. For both, CPU and
memory, highest usage among nodes (weighted more, as ideally no node
should be overcommited) and average usage of all nodes (to still be
able to distinguish in case there already is a more highly commited
node) are considered.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 Cargo.toml        |   2 +
 src/lib.rs        |   1 +
 src/pve_static.rs | 143 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 146 insertions(+)
 create mode 100644 src/pve_static.rs

diff --git a/Cargo.toml b/Cargo.toml
index ec8e12f..85d0ec1 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -18,3 +18,5 @@ path = "src/lib.rs"
 
 [dependencies]
 anyhow = "1.0"
+lazy_static = "1.4"
+serde = { version = "1.0", features = ["derive"] }
diff --git a/src/lib.rs b/src/lib.rs
index dda0563..c82bd40 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1 +1,2 @@
+pub mod pve_static;
 pub mod topsis;
diff --git a/src/pve_static.rs b/src/pve_static.rs
new file mode 100644
index 0000000..cb8a823
--- /dev/null
+++ b/src/pve_static.rs
@@ -0,0 +1,143 @@
+use anyhow::Error;
+use lazy_static::lazy_static;
+use serde::{Deserialize, Serialize};
+
+use crate::topsis::{TopsisCriteria, TopsisCriterion, TopsisMatrix};
+
+#[derive(Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+/// Static usage information of a node.
+pub struct StaticNodeUsage {
+    /// Hostname of the node.
+    pub name: String,
+    /// CPU utilization. Can be more than `maxcpu` if overcommited.
+    pub cpu: f64,
+    /// Total number of CPUs.
+    pub maxcpu: usize,
+    /// Used memory in bytes. Can be more than `maxmem` if overcommited.
+    pub mem: usize,
+    /// Total memory in bytes.
+    pub maxmem: usize,
+}
+
+#[derive(Serialize, Deserialize)]
+#[serde(rename_all = "kebab-case")]
+/// Static usage information of an HA resource.
+pub struct StaticServiceUsage {
+    /// Number of assigned CPUs or CPU limit.
+    pub maxcpu: f64,
+    /// Maximum assigned memory in bytes.
+    pub maxmem: usize,
+}
+
+/// Calculate new CPU usage in percent.
+/// `add` being `0.0` means "unlimited" and results in `max` being added.
+fn add_cpu_usage(old: f64, max: f64, add: f64) -> f64 {
+    if add == 0.0 {
+        old + max
+    } else {
+        old + add
+    }
+}
+
+impl StaticNodeUsage {
+    /// Add usage of `service` to the node's usage.
+    pub fn add_service_usage(&mut self, service: &StaticServiceUsage) {
+        self.cpu = add_cpu_usage(self.cpu, self.maxcpu as f64, service.maxcpu);
+        self.mem += service.maxmem;
+    }
+}
+
+/// A given alternative.
+struct PveTopsisAlternative {
+    average_cpu: f64,
+    highest_cpu: f64,
+    average_memory: f64,
+    highest_memory: f64,
+}
+
+const N_CRITERIA: usize = 4;
+
+// NOTE It is essenital that the order of the criteria definition and the order in the
+// From<PveTopsisAlternative> implementation match up.
+
+lazy_static! {
+    static ref PVE_HA_TOPSIS_CRITERIA: TopsisCriteria<N_CRITERIA> = TopsisCriteria::new([
+        TopsisCriterion::new("average CPU".to_string(), -1.0),
+        TopsisCriterion::new("highest CPU".to_string(), -2.0),
+        TopsisCriterion::new("average memory".to_string(), -5.0),
+        TopsisCriterion::new("highest memory".to_string(), -10.0),
+    ])
+    .unwrap();
+}
+
+impl From<PveTopsisAlternative> for [f64; N_CRITERIA] {
+    fn from(alternative: PveTopsisAlternative) -> Self {
+        [
+            alternative.average_cpu,
+            alternative.highest_cpu,
+            alternative.average_memory,
+            alternative.highest_memory,
+        ]
+    }
+}
+
+/// Scores candidate `nodes` to start a `service` on. Scoring is done according to the static memory
+/// and CPU usages of the nodes as if the service would already be running on each.
+///
+/// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher score
+/// is better.
+pub fn score_nodes_to_start_service(
+    nodes: &[&StaticNodeUsage],
+    service: &StaticServiceUsage,
+) -> Result<Vec<(String, f64)>, Error> {
+    let len = nodes.len();
+
+    let matrix = nodes
+        .iter()
+        .enumerate()
+        .map(|(target_index, _)| {
+            // all of these are as percentages to be comparable across nodes
+            let mut highest_cpu = 0.0;
+            let mut sum_cpu = 0.0;
+            let mut highest_mem = 0.0;
+            let mut sum_mem = 0.0;
+
+            for (index, node) in nodes.iter().enumerate() {
+                let new_cpu = if index == target_index {
+                    add_cpu_usage(node.cpu, node.maxcpu as f64, service.maxcpu)
+                } else {
+                    node.cpu
+                } / (node.maxcpu as f64);
+                highest_cpu = f64::max(highest_cpu, new_cpu);
+                sum_cpu += new_cpu;
+
+                let new_mem = if index == target_index {
+                    node.mem + service.maxmem
+                } else {
+                    node.mem
+                } as f64
+                    / node.maxmem as f64;
+                highest_mem = f64::max(highest_mem, new_mem);
+                sum_mem += new_mem;
+            }
+
+            PveTopsisAlternative {
+                average_cpu: sum_cpu / len as f64,
+                highest_cpu,
+                average_memory: sum_mem / len as f64,
+                highest_memory: highest_mem,
+            }
+            .into()
+        })
+        .collect::<Vec<_>>();
+
+    let scores =
+        crate::topsis::score_alternatives(&TopsisMatrix::new(matrix)?, &PVE_HA_TOPSIS_CRITERIA)?;
+
+    Ok(scores
+        .into_iter()
+        .enumerate()
+        .map(|(n, score)| (nodes[n].name.clone(), score))
+        .collect())
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH proxmox-resource-scheduling 3/3] add Debian packaging
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module Fiona Ebner
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 .gitignore           |  5 ++++
 Makefile             | 67 ++++++++++++++++++++++++++++++++++++++++++++
 debian/changelog     |  5 ++++
 debian/control       | 39 ++++++++++++++++++++++++++
 debian/copyright     | 16 +++++++++++
 debian/debcargo.toml |  7 +++++
 6 files changed, 139 insertions(+)
 create mode 100644 Makefile
 create mode 100644 debian/changelog
 create mode 100644 debian/control
 create mode 100644 debian/copyright
 create mode 100644 debian/debcargo.toml

diff --git a/.gitignore b/.gitignore
index 1e7caa9..5e0308e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,7 @@
 Cargo.lock
 target/
+proxmox-resource-scheduling-*/
+*proxmox-resource-scheduling*.buildinfo
+*proxmox-resource-scheduling*.tar.?z
+*proxmox-resource-scheduling*.changes
+*proxmox-resource-scheduling*.deb
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..27b183d
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,67 @@
+include /usr/share/dpkg/pkg-info.mk
+include /usr/share/dpkg/architecture.mk
+
+PACKAGE=proxmox-resource-scheduling
+BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION_UPSTREAM)
+BUILDDIR_TMP ?= $(BUILDDIR).tmp
+
+DEB=librust-$(PACKAGE)-dev_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
+DSC=rust-$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION).dsc
+
+ifeq ($(BUILD_MODE), release)
+CARGO_BUILD_ARGS += --release
+COMPILEDIR := target/release
+else
+COMPILEDIR := target/debug
+endif
+
+all: cargo-build $(SUBDIRS)
+
+.PHONY: cargo-build
+cargo-build:
+	cargo build $(CARGO_BUILD_ARGS)
+
+.PHONY: build
+build:
+	rm -rf $(BUILDDIR) $(BUILDDIR_TMP); mkdir $(BUILDDIR_TMP)
+	rm -f debian/control
+	debcargo package \
+	  --config debian/debcargo.toml \
+	  --changelog-ready \
+	  --no-overlay-write-back \
+	  --directory $(BUILDDIR_TMP) \
+	  $(PACKAGE) \
+	  $(shell dpkg-parsechangelog -l debian/changelog -SVersion | sed -e 's/-.*//')
+	cp $(BUILDDIR_TMP)/debian/control debian/control
+	rm -f $(BUILDDIR_TMP)/Cargo.lock
+	find $(BUILDDIR_TMP)/debian -name "*.hint" -delete
+	mv $(BUILDDIR_TMP) $(BUILDDIR)
+
+.PHONY: deb
+deb: $(DEB)
+$(DEB): build
+	cd $(BUILDDIR); dpkg-buildpackage -b -us -uc --no-pre-clean
+	lintian $(DEB)
+
+.PHONY: dsc
+dsc: $(DSC)
+$(DSC): build
+	cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d -nc
+	lintian $(DSC)
+
+.PHONY: dinstall
+dinstall: $(DEB)
+	dpkg -i $(DEB)
+
+.PHONY: upload
+upload: $(DEB)
+	tar cf - $(DEB) | ssh -X repoman@repo.proxmox.com -- upload --product devel --dist bullseye --arch $(DEB_BUILD_ARCH)
+
+.PHONY: distclean
+distclean: clean
+
+.PHONY: clean
+clean:
+	cargo clean
+	rm -rf *.deb *.buildinfo *.changes *.dsc rust-$(PACKAGE)_*.tar.?z $(BUILDDIR) $(BUILDDIR_TMP)
+	find . -name '*~' -exec rm {} ';'
diff --git a/debian/changelog b/debian/changelog
new file mode 100644
index 0000000..eaa3064
--- /dev/null
+++ b/debian/changelog
@@ -0,0 +1,5 @@
+rust-proxmox-resource-scheduling (0.1.0-1) stable; urgency=medium
+
+  * Initial release.
+
+ --  Proxmox Support Team <support@proxmox.com>  Mon, 24 Oct 2022 11:32:11 +0200
diff --git a/debian/control b/debian/control
new file mode 100644
index 0000000..697926b
--- /dev/null
+++ b/debian/control
@@ -0,0 +1,39 @@
+Source: rust-proxmox-resource-scheduling
+Section: rust
+Priority: optional
+Build-Depends: debhelper (>= 12),
+ dh-cargo (>= 25),
+ cargo:native <!nocheck>,
+ rustc:native <!nocheck>,
+ libstd-rust-dev <!nocheck>,
+ librust-anyhow-1+default-dev <!nocheck>,
+ librust-lazy-static-1+default-dev (>= 1.4-~~) <!nocheck>,
+ librust-serde-1+default-dev <!nocheck>,
+ librust-serde-1+derive-dev <!nocheck>
+Maintainer: Proxmox Support Team <support@proxmox.com>
+Standards-Version: 4.5.1
+Vcs-Git: git://git.proxmox.com/git/proxmox-resource-scheduling.git
+Vcs-Browser: https://git.proxmox.com/?p=proxmox-resource-scheduling.git
+Homepage: https://www.proxmox.com
+Rules-Requires-Root: no
+
+Package: librust-proxmox-resource-scheduling-dev
+Architecture: any
+Multi-Arch: same
+Depends:
+ ${misc:Depends},
+ librust-anyhow-1+default-dev,
+ librust-lazy-static-1+default-dev (>= 1.4-~~),
+ librust-serde-1+default-dev,
+ librust-serde-1+derive-dev
+Provides:
+ librust-proxmox-resource-scheduling+default-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0+default-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0.1-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0.1+default-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0.1.0-dev (= ${binary:Version}),
+ librust-proxmox-resource-scheduling-0.1.0+default-dev (= ${binary:Version})
+Description: Proxmox library for resource scheduling - Rust source code
+ This package contains the source for the Rust proxmox-resource-scheduling
+ crate, packaged by debcargo for use with cargo and dh-cargo.
diff --git a/debian/copyright b/debian/copyright
new file mode 100644
index 0000000..d2d30fc
--- /dev/null
+++ b/debian/copyright
@@ -0,0 +1,16 @@
+Copyright (C) 2022 Proxmox Server Solutions GmbH
+
+This software is written by Proxmox Server Solutions GmbH <support@proxmox.com>
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.
diff --git a/debian/debcargo.toml b/debian/debcargo.toml
new file mode 100644
index 0000000..7ff3d06
--- /dev/null
+++ b/debian/debcargo.toml
@@ -0,0 +1,7 @@
+overlay = "."
+crate_src_path = ".."
+maintainer = "Proxmox Support Team <support@proxmox.com>"
+
+[source]
+vcs_git = "git://git.proxmox.com/git/proxmox-resource-scheduling.git"
+vcs_browser = "https://git.proxmox.com/?p=proxmox-resource-scheduling.git"
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (2 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 3/3] add Debian packaging Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-15 10:16   ` [pve-devel] applied-series: " Wolfgang Bumiller
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] add basic test for resource scheduling Fiona Ebner
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

backed by the proxmox-resource-scheduling crate.

Initially to be used by the HA manager to allow it basing its decision
where to start a new or recovered service on static usage information
rather than just counting.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 Makefile                                 |   1 +
 pve-rs/Cargo.toml                        |   1 +
 pve-rs/src/lib.rs                        |   1 +
 pve-rs/src/resource_scheduling/mod.rs    |   1 +
 pve-rs/src/resource_scheduling/static.rs | 116 +++++++++++++++++++++++
 5 files changed, 120 insertions(+)
 create mode 100644 pve-rs/src/resource_scheduling/mod.rs
 create mode 100644 pve-rs/src/resource_scheduling/static.rs

diff --git a/Makefile b/Makefile
index 0836c9d..3ddafd0 100644
--- a/Makefile
+++ b/Makefile
@@ -56,6 +56,7 @@ gen:
 	perl ./scripts/genpackage.pl PVE \
 	  PVE::RS::APT::Repositories \
 	  PVE::RS::OpenId \
+	  PVE::RS::ResourceScheduling::Static \
 	  PVE::RS::TFA
 	perl ./scripts/genpackage.pl PMG \
 	  PMG::RS::APT::Repositories \
diff --git a/pve-rs/Cargo.toml b/pve-rs/Cargo.toml
index 855c72d..daa6bff 100644
--- a/pve-rs/Cargo.toml
+++ b/pve-rs/Cargo.toml
@@ -33,6 +33,7 @@ perlmod = { version = "0.13", features = [ "exporter" ] }
 proxmox-apt = "0.9"
 proxmox-http = { version = "0.7", features = ["client-sync", "client-trait"] }
 proxmox-openid = "0.9.5"
+proxmox-resource-scheduling = "0.1"
 proxmox-subscription = "0.3"
 proxmox-sys = "0.4"
 proxmox-tfa = { version = "2.1", features = ["api"] }
diff --git a/pve-rs/src/lib.rs b/pve-rs/src/lib.rs
index 26b998b..562a4d4 100644
--- a/pve-rs/src/lib.rs
+++ b/pve-rs/src/lib.rs
@@ -5,4 +5,5 @@ pub mod common;
 
 pub mod apt;
 pub mod openid;
+pub mod resource_scheduling;
 pub mod tfa;
diff --git a/pve-rs/src/resource_scheduling/mod.rs b/pve-rs/src/resource_scheduling/mod.rs
new file mode 100644
index 0000000..a28f1c9
--- /dev/null
+++ b/pve-rs/src/resource_scheduling/mod.rs
@@ -0,0 +1 @@
+pub mod r#static;
diff --git a/pve-rs/src/resource_scheduling/static.rs b/pve-rs/src/resource_scheduling/static.rs
new file mode 100644
index 0000000..c47dcd3
--- /dev/null
+++ b/pve-rs/src/resource_scheduling/static.rs
@@ -0,0 +1,116 @@
+#[perlmod::package(name = "PVE::RS::ResourceScheduling::Static", lib = "pve_rs")]
+mod export {
+    use std::collections::HashMap;
+    use std::sync::Mutex;
+
+    use anyhow::{bail, Error};
+
+    use perlmod::Value;
+    use proxmox_resource_scheduling::pve_static::{StaticNodeUsage, StaticServiceUsage};
+
+    perlmod::declare_magic!(Box<Scheduler> : &Scheduler as "PVE::RS::ResourceScheduling::Static");
+
+    struct Usage {
+        nodes: HashMap<String, StaticNodeUsage>,
+    }
+
+    pub struct Scheduler {
+        inner: Mutex<Usage>,
+    }
+
+    #[export(raw_return)]
+    fn new(#[raw] class: Value) -> Result<Value, Error> {
+        let inner = Usage {
+            nodes: HashMap::new(),
+        };
+
+        Ok(perlmod::instantiate_magic!(
+            &class, MAGIC => Box::new(Scheduler { inner: Mutex::new(inner) })
+        ))
+    }
+
+    #[export]
+    fn add_node(
+        #[try_from_ref] this: &Scheduler,
+        nodename: String,
+        maxcpu: usize,
+        maxmem: usize,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        if usage.nodes.contains_key(&nodename) {
+            bail!("node {} already added", nodename);
+        }
+
+        let node = StaticNodeUsage {
+            name: nodename.clone(),
+            cpu: 0.0,
+            maxcpu,
+            mem: 0,
+            maxmem,
+        };
+
+        usage.nodes.insert(nodename, node);
+        Ok(())
+    }
+
+    #[export]
+    fn remove_node(#[try_from_ref] this: &Scheduler, nodename: &str) {
+        let mut usage = this.inner.lock().unwrap();
+
+        usage.nodes.remove(nodename);
+    }
+
+    #[export]
+    fn list_nodes(#[try_from_ref] this: &Scheduler) -> Vec<String> {
+        let usage = this.inner.lock().unwrap();
+
+        usage
+            .nodes
+            .keys()
+            .map(|nodename| nodename.to_string())
+            .collect()
+    }
+
+    #[export]
+    fn contains_node(#[try_from_ref] this: &Scheduler, nodename: &str) -> bool {
+        let usage = this.inner.lock().unwrap();
+
+        usage.nodes.contains_key(nodename)
+    }
+
+    /// Add usage of `service` to the node's usage.
+    #[export]
+    fn add_service_usage_to_node(
+        #[try_from_ref] this: &Scheduler,
+        nodename: &str,
+        service: StaticServiceUsage,
+    ) -> Result<(), Error> {
+        let mut usage = this.inner.lock().unwrap();
+
+        match usage.nodes.get_mut(nodename) {
+            Some(node) => {
+                node.add_service_usage(&service);
+                Ok(())
+            }
+            None => bail!("node '{}' not present in usage hashmap", nodename),
+        }
+    }
+
+    /// Scores all previously added nodes for starting a `service` on. Scoring is done according to
+    /// the static memory and CPU usages of the nodes as if the service would already be running on
+    /// each.
+    ///
+    /// Returns a vector of (nodename, score) pairs. Scores are between 0.0 and 1.0 and a higher
+    /// score is better.
+    #[export]
+    fn score_nodes_to_start_service(
+        #[try_from_ref] this: &Scheduler,
+        service: StaticServiceUsage,
+    ) -> Result<Vec<(String, f64)>, Error> {
+        let usage = this.inner.lock().unwrap();
+        let nodes = usage.nodes.values().collect::<Vec<&StaticNodeUsage>>();
+
+        proxmox_resource_scheduling::pve_static::score_nodes_to_start_service(&nodes, &service)
+    }
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH proxmox-perl-rs 2/2] add basic test for resource scheduling
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (3 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH manager 1/3] pvestatd: broadcast static node information Fiona Ebner
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

currently only used to test the installed version and not
automatically during build. See the FIXME note for why.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 pve-rs/test/Makefile               |  4 ++
 pve-rs/test/README                 |  2 +
 pve-rs/test/resource_scheduling.pl | 70 ++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+)
 create mode 100644 pve-rs/test/Makefile
 create mode 100644 pve-rs/test/README
 create mode 100755 pve-rs/test/resource_scheduling.pl

diff --git a/pve-rs/test/Makefile b/pve-rs/test/Makefile
new file mode 100644
index 0000000..dc0a5bd
--- /dev/null
+++ b/pve-rs/test/Makefile
@@ -0,0 +1,4 @@
+.PHONY: test
+test:
+	@echo "-- running pve-rs tests --"
+	./resource_scheduling.pl
diff --git a/pve-rs/test/README b/pve-rs/test/README
new file mode 100644
index 0000000..662b131
--- /dev/null
+++ b/pve-rs/test/README
@@ -0,0 +1,2 @@
+Note that these tests are not run during build currently! See the FIXME note in
+resource_scheduling.pl for why.
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
new file mode 100755
index 0000000..b55843c
--- /dev/null
+++ b/pve-rs/test/resource_scheduling.pl
@@ -0,0 +1,70 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+# FIXME ensure that the just built library is loaded rather than the installed one and add a test
+# target to pve-rs/Makefile afterwards. Issue is that the loader looks into an $PATH/auto directory,
+# so it's not enough to use lib qw(../target/release)
+# Also might be a good idea to test for existence of the files to avoid surprises if the directory
+# structure changes in the future.
+#use lib qw(..);
+#use lib qw(../target/release);
+use PVE::RS::ResourceScheduling::Static;
+
+sub assert_num_eq {
+    my ($left, $right) = @_;
+    my (undef, undef, $line) = caller();
+    die "assertion failed: '$left != $right' at line $line\n" if $left != $right;
+}
+
+sub assert_str_eq {
+    my ($left, $right) = @_;
+    my (undef, undef, $line) = caller();
+    die "assertion failed: '$left ne $right' at line $line\n" if $left ne $right;
+}
+
+sub assert {
+    my ($bool) = @_;
+    my (undef, undef, $line) = caller();
+    die "assertion failed at line $line\n" if !$bool;
+}
+
+my $static = PVE::RS::ResourceScheduling::Static->new();
+assert_num_eq(scalar($static->list_nodes()->@*), 0);
+$static->add_node("A", 10, 100_000_000_000);
+assert_num_eq(scalar($static->list_nodes()->@*), 1);
+$static->add_node("B", 20, 200_000_000_000);
+assert_num_eq(scalar($static->list_nodes()->@*), 2);
+$static->add_node("C", 30, 300_000_000_000);
+assert_num_eq(scalar($static->list_nodes()->@*), 3);
+$static->remove_node("C");
+assert_num_eq(scalar($static->list_nodes()->@*), 2);
+assert($static->contains_node("A"));
+assert($static->contains_node("B"));
+assert(!$static->contains_node("C"));
+
+my $service = {
+    maxcpu => 4,
+    maxmem => 20_000_000_000,
+};
+
+for (my $i = 0; $i < 15; $i++) {
+    my $score_list = $static->score_nodes_to_start_service($service);
+
+    # imitate HA manager
+    my $scores = { map { $_->[0] => -$_->[1] } $score_list->@* };
+    my @nodes = sort {
+	$scores->{$a} <=> $scores->{$b} || $a cmp $b
+    } keys $scores->%*;
+
+    if ($i % 3 == 2) {
+	assert_str_eq($nodes[0], "A");
+	assert_str_eq($nodes[1], "B");
+    } else {
+	assert_str_eq($nodes[0], "B");
+	assert_str_eq($nodes[1], "A");
+    }
+
+    $static->add_service_usage_to_node($nodes[0], $service);
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH manager 1/3] pvestatd: broadcast static node information
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (4 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] add basic test for resource scheduling Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH v3 manager 2/3] cluster resources: add cgroup-mode to node properties Fiona Ebner
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Planned to be used for static resource scheduling in the HA manager.

It's enough to broadcast the values whenever they are outdated or not
set in the node's local kv store, because pmxcfs will re-broadcast the
local kv store whenever the quorate partition changes. This is already
relied upon for the 'ceph-versions' kv pair.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

To avoid potential left-over stale information, the following
patch would help:
https://lists.proxmox.com/pipermail/pve-devel/2022-October/054219.html

 PVE/Service/pvestatd.pm | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/PVE/Service/pvestatd.pm b/PVE/Service/pvestatd.pm
index eac953df..eb0dc130 100755
--- a/PVE/Service/pvestatd.pm
+++ b/PVE/Service/pvestatd.pm
@@ -123,6 +123,24 @@ my $generate_rrd_string = sub {
     return join(':', map { $_ // 'U' } @$data);
 };
 
+my sub broadcast_static_node_info {
+    my ($cpus, $memory) = @_;
+
+    my $old = PVE::Cluster::get_node_kv('static-info', $nodename);
+    $old = eval { decode_json($old->{$nodename}) } if defined($old->{$nodename});
+
+    if (
+	!defined($old->{cpus}) || $old->{cpus} != $cpus
+	|| !defined($old->{memory}) || $old->{memory} != $memory
+    ) {
+	my $info = {
+	    cpus => $cpus,
+	    memory => $memory,
+	};
+	PVE::Cluster::broadcast_node_kv('static-info', encode_json($info));
+    }
+}
+
 sub update_node_status {
     my ($status_cfg) = @_;
 
@@ -175,6 +193,8 @@ sub update_node_status {
     my $transactions = PVE::ExtMetric::transactions_start($status_cfg);
     PVE::ExtMetric::update_all($transactions, 'node', $nodename, $node_metric, $ctime);
     PVE::ExtMetric::transactions_finish($transactions);
+
+    broadcast_static_node_info($maxcpu, $meminfo->{memtotal});
 }
 
 sub auto_balloning {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH v3 manager 2/3] cluster resources: add cgroup-mode to node properties
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (5 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH manager 1/3] pvestatd: broadcast static node information Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH v2 manager 3/3] ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version Fiona Ebner
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

so the frontend has the information readily available.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v2:
    * Add it to the new 'static-info' kv instead of being standalone.
    * Fix params for get_node_kv() call (already in previous patch).

 PVE/API2/Cluster.pm     | 13 +++++++++++++
 PVE/Service/pvestatd.pm |  5 +++++
 2 files changed, 18 insertions(+)

diff --git a/PVE/API2/Cluster.pm b/PVE/API2/Cluster.pm
index 49e319a5..dc575d85 100644
--- a/PVE/API2/Cluster.pm
+++ b/PVE/API2/Cluster.pm
@@ -323,6 +323,11 @@ __PACKAGE__->register_method({
 		    optional => 1,
 		    minimum => 1,
 		},
+		'cgroup-mode' => {
+		    description => "The cgroup mode the node operates under (when type == node).",
+		    type => 'integer',
+		    optional => 1,
+		},
 	    },
 	},
     },
@@ -427,10 +432,18 @@ __PACKAGE__->register_method({
 	    }
 	}
 
+	my $static_node_info = PVE::Cluster::get_node_kv("static-info");
+
 	if (!$param->{type} || $param->{type} eq 'node') {
 	    foreach my $node (@$nodelist) {
 		my $can_audit = $rpcenv->check($authuser, "/nodes/$node", [ 'Sys.Audit' ], 1);
 		my $entry = PVE::API2Tools::extract_node_stats($node, $members, $rrd, !$can_audit);
+
+		my $info = eval { decode_json($static_node_info->{$node}); };
+		if (defined(my $mode = $info->{'cgroup-mode'})) {
+		    $entry->{'cgroup-mode'} = int($mode);
+		}
+
 		push @$res, $entry;
 	    }
 	}
diff --git a/PVE/Service/pvestatd.pm b/PVE/Service/pvestatd.pm
index eb0dc130..2515120c 100755
--- a/PVE/Service/pvestatd.pm
+++ b/PVE/Service/pvestatd.pm
@@ -126,17 +126,22 @@ my $generate_rrd_string = sub {
 my sub broadcast_static_node_info {
     my ($cpus, $memory) = @_;
 
+    my $cgroup_mode = eval { PVE::CGroup::cgroup_mode(); };
+    syslog('err', "cgroup mode error: $@") if $@;
+
     my $old = PVE::Cluster::get_node_kv('static-info', $nodename);
     $old = eval { decode_json($old->{$nodename}) } if defined($old->{$nodename});
 
     if (
 	!defined($old->{cpus}) || $old->{cpus} != $cpus
 	|| !defined($old->{memory}) || $old->{memory} != $memory
+	|| ($old->{'cgroup-mode'} // -1) != ($cgroup_mode // -1)
     ) {
 	my $info = {
 	    cpus => $cpus,
 	    memory => $memory,
 	};
+	$info->{'cgroup-mode'} = $cgroup_mode if defined($cgroup_mode);
 	PVE::Cluster::broadcast_node_kv('static-info', encode_json($info));
     }
 }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH v2 manager 3/3] ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (6 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH v3 manager 2/3] cluster resources: add cgroup-mode to node properties Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options Fiona Ebner
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

so that the default value and limits actually correspond to what will
be used. Defaults to values for cgroup v2, because that is the more
common scenario.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 www/manager6/lxc/CreateWizard.js   |  8 ++++++++
 www/manager6/lxc/ResourceEdit.js   | 31 +++++++++++++++++++++++++-----
 www/manager6/lxc/Resources.js      |  8 +++++++-
 www/manager6/qemu/CreateWizard.js  |  8 ++++++++
 www/manager6/qemu/HardwareView.js  |  8 +++++++-
 www/manager6/qemu/ProcessorEdit.js | 31 +++++++++++++++++++++++-------
 6 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/www/manager6/lxc/CreateWizard.js b/www/manager6/lxc/CreateWizard.js
index 1f902c2c..0b82cc1c 100644
--- a/www/manager6/lxc/CreateWizard.js
+++ b/www/manager6/lxc/CreateWizard.js
@@ -8,6 +8,14 @@ Ext.define('PVE.lxc.CreateWizard', {
 	    storage: '',
 	    unprivileged: true,
 	},
+	formulas: {
+	    cgroupMode: function(get) {
+		const nodeInfo = PVE.data.ResourceStore.getNodes().find(
+		    node => node.node === get('nodename'),
+		);
+		return nodeInfo ? nodeInfo['cgroup-mode'] : 2;
+	    },
+	},
     },
 
     cbindData: {
diff --git a/www/manager6/lxc/ResourceEdit.js b/www/manager6/lxc/ResourceEdit.js
index 42357d85..9f4f7e08 100644
--- a/www/manager6/lxc/ResourceEdit.js
+++ b/www/manager6/lxc/ResourceEdit.js
@@ -20,9 +20,17 @@ Ext.define('PVE.lxc.MemoryEdit', {
 
 Ext.define('PVE.lxc.CPUEdit', {
     extend: 'Proxmox.window.Edit',
+    alias: 'widget.pveLxcCPUEdit',
+
+    viewModel: {
+	data: {
+	    cgroupMode: 2,
+	},
+    },
 
     initComponent: function() {
-	var me = this;
+	let me = this;
+	me.getViewModel().set('cgroupMode', me.cgroupMode);
 
 	Ext.apply(me, {
 	    subject: gettext('CPU'),
@@ -35,6 +43,7 @@ Ext.define('PVE.lxc.CPUEdit', {
     },
 });
 
+// The view model of the parent shoul contain a 'cgroupMode' variable (or params for v2 are used).
 Ext.define('PVE.lxc.CPUInputPanel', {
     extend: 'Proxmox.panel.InputPanel',
     alias: 'widget.pveLxcCPUInputPanel',
@@ -43,11 +52,19 @@ Ext.define('PVE.lxc.CPUInputPanel', {
 
     insideWizard: false,
 
+    viewModel: {
+	formulas: {
+	    cpuunitsDefault: (get) => get('cgroupMode') === 1 ? 1024 : 100,
+	    cpuunitsMax: (get) => get('cgroupMode') === 1 ? 500000 : 10000,
+	},
+    },
+
     onGetValues: function(values) {
-	var me = this;
+	let me = this;
+	let cpuunitsDefault = me.getViewModel().get('cpuunitsDefault');
 
 	PVE.Utils.delete_if_default(values, 'cpulimit', '0', me.insideWizard);
-	PVE.Utils.delete_if_default(values, 'cpuunits', '1024', me.insideWizard);
+	PVE.Utils.delete_if_default(values, 'cpuunits', `${cpuunitsDefault}`, me.insideWizard);
 
 	return values;
     },
@@ -72,8 +89,12 @@ Ext.define('PVE.lxc.CPUInputPanel', {
 	    fieldLabel: gettext('CPU units'),
 	    value: '',
 	    minValue: 8,
-	    maxValue: 500000,
-	    emptyText: '1024',
+	    maxValue: '10000',
+	    emptyText: '100',
+	    bind: {
+		emptyText: '{cpuunitsDefault}',
+		maxValue: '{cpuunitsMax}',
+	    },
 	    labelWidth: labelWidth,
 	    deleteEmpty: true,
 	    allowBlank: true,
diff --git a/www/manager6/lxc/Resources.js b/www/manager6/lxc/Resources.js
index 4b2ae95e..85112345 100644
--- a/www/manager6/lxc/Resources.js
+++ b/www/manager6/lxc/Resources.js
@@ -45,6 +45,12 @@ Ext.define('PVE.lxc.RessourceView', {
 
 	var mpeditor = caps.vms['VM.Config.Disk'] ? 'PVE.lxc.MountPointEdit' : undefined;
 
+	const nodeInfo = PVE.data.ResourceStore.getNodes().find(node => node.node === nodename);
+	let cpuEditor = {
+	    xtype: 'pveLxcCPUEdit',
+	    cgroupMode: nodeInfo['cgroup-mode'],
+	};
+
 	var rows = {
 	    memory: {
 		header: gettext('Memory'),
@@ -68,7 +74,7 @@ Ext.define('PVE.lxc.RessourceView', {
 	    },
 	    cores: {
 		header: gettext('Cores'),
-		editor: caps.vms['VM.Config.CPU'] ? 'PVE.lxc.CPUEdit' : undefined,
+		editor: caps.vms['VM.Config.CPU'] ? cpuEditor : undefined,
 		defaultValue: '',
 		tdCls: 'pmx-itype-icon-processor',
 		group: 3,
diff --git a/www/manager6/qemu/CreateWizard.js b/www/manager6/qemu/CreateWizard.js
index a785a882..a65067ea 100644
--- a/www/manager6/qemu/CreateWizard.js
+++ b/www/manager6/qemu/CreateWizard.js
@@ -10,6 +10,14 @@ Ext.define('PVE.qemu.CreateWizard', {
 		scsihw: '',
 	    },
 	},
+	formulas: {
+	    cgroupMode: function(get) {
+		const nodeInfo = PVE.data.ResourceStore.getNodes().find(
+		    node => node.node === get('nodename'),
+		);
+		return nodeInfo ? nodeInfo['cgroup-mode'] : 2;
+	    },
+	},
     },
 
     cbindData: {
diff --git a/www/manager6/qemu/HardwareView.js b/www/manager6/qemu/HardwareView.js
index 6e9d03b4..eb069b30 100644
--- a/www/manager6/qemu/HardwareView.js
+++ b/www/manager6/qemu/HardwareView.js
@@ -59,6 +59,12 @@ Ext.define('PVE.qemu.HardwareView', {
 
 	let isCloudInitKey = v => v && v.toString().match(/vm-.*-cloudinit/);
 
+	const nodeInfo = PVE.data.ResourceStore.getNodes().find(node => node.node === nodename);
+	let processorEditor = {
+	    xtype: 'pveQemuProcessorEdit',
+	    cgroupMode: nodeInfo['cgroup-mode'],
+	};
+
 	let rows = {
 	    memory: {
 		header: gettext('Memory'),
@@ -93,7 +99,7 @@ Ext.define('PVE.qemu.HardwareView', {
 		header: gettext('Processors'),
 		never_delete: true,
 		editor: caps.vms['VM.Config.CPU'] || caps.vms['VM.Config.HWType']
-		    ? 'PVE.qemu.ProcessorEdit' : undefined,
+		    ? processorEditor : undefined,
 		tdCls: 'pve-itype-icon-cpu',
 		group: 3,
 		defaultValue: '1',
diff --git a/www/manager6/qemu/ProcessorEdit.js b/www/manager6/qemu/ProcessorEdit.js
index fe67c7f5..8e9f94aa 100644
--- a/www/manager6/qemu/ProcessorEdit.js
+++ b/www/manager6/qemu/ProcessorEdit.js
@@ -1,3 +1,4 @@
+// The view model of the parent shoul contain a 'cgroupMode' variable (or params for v2 are used).
 Ext.define('PVE.qemu.ProcessorInputPanel', {
     extend: 'Proxmox.panel.InputPanel',
     alias: 'widget.pveQemuProcessorPanel',
@@ -13,6 +14,9 @@ Ext.define('PVE.qemu.ProcessorInputPanel', {
 	},
 	formulas: {
 	    totalCoreCount: get => get('socketCount') * get('coreCount'),
+	    cpuunitsDefault: (get) => get('cgroupMode') === 1 ? 1024 : 100,
+	    cpuunitsMin: (get) => get('cgroupMode') === 1 ? 2 : 1,
+	    cpuunitsMax: (get) => get('cgroupMode') === 1 ? 262144 : 10000,
 	},
     },
 
@@ -21,7 +25,8 @@ Ext.define('PVE.qemu.ProcessorInputPanel', {
     },
 
     onGetValues: function(values) {
-	var me = this;
+	let me = this;
+	let cpuunitsDefault = me.getViewModel().get('cpuunitsDefault');
 
 	if (Array.isArray(values.delete)) {
 	    values.delete = values.delete.join(',');
@@ -39,7 +44,7 @@ Ext.define('PVE.qemu.ProcessorInputPanel', {
 	}
 
 	PVE.Utils.delete_if_default(values, 'cpulimit', '0', me.insideWizard);
-	PVE.Utils.delete_if_default(values, 'cpuunits', '1024', me.insideWizard);
+	PVE.Utils.delete_if_default(values, 'cpuunits', `${cpuunitsDefault}`, me.insideWizard);
 
 	// build the cpu options:
 	me.cpu.cputype = values.cputype;
@@ -210,11 +215,15 @@ Ext.define('PVE.qemu.ProcessorInputPanel', {
 	    xtype: 'proxmoxintegerfield',
 	    name: 'cpuunits',
 	    fieldLabel: gettext('CPU units'),
-	    // FIXME: change to [1, 1000] once cgroup v1 support gets removed (PVE 8 ?)
-	    minValue: 2,
-	    maxValue: 262144,
+	    minValue: '1',
+	    maxValue: '10000',
 	    value: '',
-	    emptyText: '1024',
+	    emptyText: '100',
+	    bind: {
+		minValue: '{cpuunitsMin}',
+		maxValue: '{cpuunitsMax}',
+		emptyText: '{cpuunitsDefault}',
+	    },
 	    deleteEmpty: true,
 	    allowBlank: true,
 	},
@@ -239,11 +248,19 @@ Ext.define('PVE.qemu.ProcessorInputPanel', {
 
 Ext.define('PVE.qemu.ProcessorEdit', {
     extend: 'Proxmox.window.Edit',
+    alias: 'widget.pveQemuProcessorEdit',
 
     width: 700,
 
+    viewModel: {
+	data: {
+	    cgroupMode: 2,
+	},
+    },
+
     initComponent: function() {
-	var me = this;
+	let me = this;
+	me.getViewModel().set('cgroupMode', me.cgroupMode);
 
 	var ipanel = Ext.create('PVE.qemu.ProcessorInputPanel');
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (7 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH v2 manager 3/3] ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-17 11:52   ` [pve-devel] applied: " Thomas Lamprecht
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 01/11] env: add get_static_node_stats() method Fiona Ebner
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Initially, with a setting for HA to switch between basic (just count
services) and static (use static node and resource information).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 data/PVE/DataCenterConfig.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/data/PVE/DataCenterConfig.pm b/data/PVE/DataCenterConfig.pm
index 6a2adee..fcc9684 100644
--- a/data/PVE/DataCenterConfig.pm
+++ b/data/PVE/DataCenterConfig.pm
@@ -7,6 +7,18 @@ use PVE::JSONSchema qw(parse_property_string);
 use PVE::Tools;
 use PVE::Cluster;
 
+my $crs_format = {
+    ha => {
+	type => 'string',
+	enum => ['basic', 'static'],
+	description => "Use this resource scheduler mode for HA.",
+	default => 'basic',
+	verbose_description => "Configures how the HA manager should select nodes to start or ".
+	    "recover services. With 'basic', only the number of services is used, with 'static', ".
+	    "static CPU and memory configuration of services is considered.",
+    },
+};
+
 my $migration_format = {
     type => {
 	default_key => 1,
@@ -135,6 +147,11 @@ my $datacenter_schema = {
     type => "object",
     additionalProperties => 0,
     properties => {
+	crs => {
+	    optional => 1,
+	    type => 'string', format => $crs_format,
+	    description => "Cluster resource scheduling settings.",
+	},
 	keyboard => {
 	    optional => 1,
 	    type => 'string',
@@ -280,6 +297,10 @@ sub parse_datacenter_config {
 
     $res->{description} = $comment;
 
+    if (my $crs = $res->{crs}) {
+	$res->{crs} = parse_property_string($crs_format, $crs);
+    }
+
     if (my $migration = $res->{migration}) {
 	$res->{migration} = parse_property_string($migration_format, $migration);
     }
@@ -332,6 +353,10 @@ sub write_datacenter_config {
 	$cfg->{console} = 'html5';
     }
 
+    if (ref(my $crs = $cfg->{crs})) {
+	$cfg->{crs} = PVE::JSONSchema::print_property_string($crs, $crs_format);
+    }
+
     if (ref(my $migration = $cfg->{migration})) {
 	$cfg->{migration} = PVE::JSONSchema::print_property_string($migration, $migration_format);
     }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 01/11] env: add get_static_node_stats() method
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (8 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method Fiona Ebner
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

to be used for static resource scheduling.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Env.pm         |  6 ++++++
 src/PVE/HA/Env/PVE2.pm    | 13 +++++++++++++
 src/PVE/HA/Sim/TestEnv.pm |  6 ++++++
 3 files changed, 25 insertions(+)

diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index ac569a9..00e3e3c 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -269,4 +269,10 @@ sub get_ha_settings {
     return $self->{plug}->get_ha_settings();
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_static_node_stats();
+}
+
 1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 5e0a683..7cecf35 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -5,6 +5,7 @@ use warnings;
 use POSIX qw(:errno_h :fcntl_h);
 use IO::File;
 use IO::Socket::UNIX;
+use JSON;
 
 use PVE::SafeSyslog;
 use PVE::Tools;
@@ -459,4 +460,16 @@ sub get_ha_settings {
     return $datacenterconfig->{ha};
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    my $stats = PVE::Cluster::get_node_kv('static-info');
+    for my $node (keys $stats->%*) {
+	$stats->{$node} = eval { decode_json($stats->{$node}) };
+	$self->log('err', "unable to decode static node info for '$node' - $@") if $@;
+    }
+
+    return $stats;
+}
+
 1;
diff --git a/src/PVE/HA/Sim/TestEnv.pm b/src/PVE/HA/Sim/TestEnv.pm
index b448d72..e12d761 100644
--- a/src/PVE/HA/Sim/TestEnv.pm
+++ b/src/PVE/HA/Sim/TestEnv.pm
@@ -118,4 +118,10 @@ sub get_max_workers {
     return 0;
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->{'static-node-stats'};
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (9 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 01/11] env: add get_static_node_stats() method Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-15 13:28   ` Thomas Lamprecht
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 03/11] add Usage base plugin and Usage::Basic plugin Fiona Ebner
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

to be used for static resource scheduling.

In container's vmstatus(), the 'cores' option takes precedence over
the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit'
here.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Resources.pm       |  5 +++++
 src/PVE/HA/Resources/PVECT.pm | 11 +++++++++++
 src/PVE/HA/Resources/PVEVM.pm | 14 ++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 835c314..f7eaff2 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -161,6 +161,11 @@ sub remove_locks {
     die "implement in subclass";
 }
 
+sub get_static_stats {
+    my ($class, $id, $service_node) = @_;
+
+    die "implement in subclass";
+}
 
 # package PVE::HA::Resources::IPAddr;
 
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index 015faf3..8209d9c 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -150,4 +150,15 @@ sub remove_locks {
     return undef;
 }
 
+sub get_static_stats {
+    my ($class, $id, $service_node) = @_;
+
+    my $conf = PVE::LXC::Config->load_config($id, $service_node);
+
+    return {
+	maxcpu => $conf->{cpulimit} || $conf->{cores} || 0,
+	maxmem => ($conf->{memory} || 512) * 1024 * 1024,
+    };
+}
+
 1;
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index 58c83e0..85196ed 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -173,4 +173,18 @@ sub remove_locks {
     return undef;
 }
 
+sub get_static_stats {
+    my ($class, $id, $service_node) = @_;
+
+    my $conf = PVE::QemuConfig->load_config($id, $service_node);
+    my $defaults = PVE::QemuServer::load_defaults();
+
+    my $cpus = ($conf->{sockets} || $defaults->{sockets}) * ($conf->{cores} || $defaults->{cores});
+
+    return {
+	maxcpu => $conf->{vcpus} || $cpus,
+	maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,
+    };
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 03/11] add Usage base plugin and Usage::Basic plugin
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (10 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters Fiona Ebner
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

in preparation to also support static resource scheduling via another
such Usage plugin.

The interface is designed in anticipation of the Usage::Static plugin,
the Usage::Basic plugin doesn't require all parameters.

In Usage::Static, the $haenv will necessary for logging and getting
the static node stats. add_service_usage_to_node() and
score_nodes_to_start_service() take the sid, service node and the
former also the optional migration target (during a migration it's not
clear whether the config file has already been moved or not) to be
able to get the static service stats.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 debian/pve-ha-manager.install |  2 ++
 src/PVE/HA/Makefile           |  3 +-
 src/PVE/HA/Usage.pm           | 49 +++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Basic.pm     | 52 +++++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Makefile     |  6 ++++
 5 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 33a5c58..87fb24c 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,5 +33,7 @@
 /usr/share/perl5/PVE/HA/Resources/PVECT.pm
 /usr/share/perl5/PVE/HA/Resources/PVEVM.pm
 /usr/share/perl5/PVE/HA/Tools.pm
+/usr/share/perl5/PVE/HA/Usage.pm
+/usr/share/perl5/PVE/HA/Usage/Basic.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index c366f6c..8c91b97 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,5 +1,5 @@
 SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
-	NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm
+	NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
 
 SOURCES=${SIM_SOURCES} Config.pm
 
@@ -8,6 +8,7 @@ install:
 	install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
 	for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
 	make -C Resources install
+	make -C Usage install
 	make -C Env install
 
 .PHONY: installsim
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
new file mode 100644
index 0000000..4c723d1
--- /dev/null
+++ b/src/PVE/HA/Usage.pm
@@ -0,0 +1,49 @@
+package PVE::HA::Usage;
+
+use strict;
+use warnings;
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    die "implement in subclass";
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    die "implement in subclass";
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    die "implement in subclass";
+}
+
+# Returns a hash with $nodename => $score pairs. A lower $score is better.
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    die "implement in subclass";
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
new file mode 100644
index 0000000..f066350
--- /dev/null
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -0,0 +1,52 @@
+package PVE::HA::Usage::Basic;
+
+use strict;
+use warnings;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    return bless {
+	nodes => {},
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    $self->{nodes}->{$nodename} = 0;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    delete $self->{nodes}->{$nodename};
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return keys $self->{nodes}->%*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return defined($self->{nodes}->{$nodename});
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    $self->{nodes}->{$nodename}++;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    return $self->{nodes};
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
new file mode 100644
index 0000000..ccf1282
--- /dev/null
+++ b/src/PVE/HA/Usage/Makefile
@@ -0,0 +1,6 @@
+SOURCES=Basic.pm
+
+.PHONY: install
+install:
+	install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Usage
+	for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Usage/$$i; done
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (11 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 03/11] add Usage base plugin and Usage::Basic plugin Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-16  7:17   ` Thomas Lamprecht
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 05/11] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

In preparation for scheduling based on static information, where the
scoring of nodes depends on information from the service's
VM/CT configuration file (and the $sid is required to query that).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Manager.pm      | 4 +++-
 src/test/test_failover1.pl | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 518f64f..63c94af 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -119,7 +119,7 @@ sub get_node_priority_groups {
 }
 
 sub select_service_node {
-    my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
+    my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
 
     my $group = get_service_group($groups, $online_node_usage, $service_conf);
 
@@ -766,6 +766,7 @@ sub next_state_started {
 	    my $node = select_service_node(
 	        $self->{groups},
 		$self->{online_node_usage},
+		$sid,
 		$cd,
 		$sd->{node},
 		$try_next,
@@ -847,6 +848,7 @@ sub next_state_recovery {
     my $recovery_node = select_service_node(
 	$self->{groups},
 	$self->{online_node_usage},
+	$sid,
 	$cd,
 	$sd->{node},
     );
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 67573a2..f11d1a6 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -30,7 +30,7 @@ sub test {
     my ($expected_node, $try_next) = @_;
     
     my $node = PVE::HA::Manager::select_service_node
-	($groups, $online_node_usage, $service_conf, $current_node, $try_next);
+	($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
 
     my (undef, undef, $line) = caller();
     die "unexpected result: $node != ${expected_node} at line $line\n" 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 05/11] manager: online node usage: switch to Usage::Basic plugin
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (12 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin Fiona Ebner
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

no functional change is intended.

One test needs adaptation too, because it created its own version of
$online_node_usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Manager.pm      | 35 +++++++++++++++++------------------
 src/test/test_failover1.pl | 19 ++++++++++---------
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63c94af..63e6c8a 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -7,6 +7,7 @@ use Digest::MD5 qw(md5_base64);
 use PVE::Tools;
 use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
+use PVE::HA::Usage::Basic;
 
 ## Variable Name & Abbreviations Convention
 #
@@ -77,9 +78,7 @@ sub get_service_group {
 
     my $group = {};
     # add all online nodes to default group to allow try_next when no group set
-    foreach my $node (keys %$online_node_usage) {
-	$group->{nodes}->{$node} = 1;
-    }
+    $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
 
     # overwrite default if service is bound to a specific group
     if (my $group_id = $service_conf->{group}) {
@@ -100,7 +99,7 @@ sub get_node_priority_groups {
 	if ($entry =~ m/^(\S+):(\d+)$/) {
 	    ($node, $pri) = ($1, $2);
 	}
-	next if !defined($online_node_usage->{$node}); # offline
+	next if !$online_node_usage->contains_node($node); # offline
 	$pri_groups->{$pri}->{$node} = 1;
 	$group_members->{$node} = $pri;
     }
@@ -108,7 +107,7 @@ sub get_node_priority_groups {
     # add non-group members to unrestricted groups (priority -1)
     if (!$group->{restricted}) {
 	my $pri = -1;
-	foreach my $node (keys %$online_node_usage) {
+	for my $node ($online_node_usage->list_nodes()) {
 	    next if defined($group_members->{$node});
 	    $pri_groups->{$pri}->{$node} = 1;
 	    $group_members->{$node} = -1;
@@ -144,8 +143,9 @@ sub select_service_node {
 	}
     }
 
+    my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
-	$online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b
+	$scores->{$a} <=> $scores->{$b} || $a cmp $b
     } keys %{$pri_groups->{$top_pri}};
 
     my $found;
@@ -201,39 +201,38 @@ my $valid_service_states = {
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $online_node_usage = {};
+    my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
 
     my $online_nodes = $self->{ns}->list_online_nodes();
 
-    foreach my $node (@$online_nodes) {
-	$online_node_usage->{$node} = 0;
-    }
+    $online_node_usage->add_node($_) for $online_nodes->@*;
 
     foreach my $sid (keys %{$self->{ss}}) {
 	my $sd = $self->{ss}->{$sid};
 	my $state = $sd->{state};
 	my $target = $sd->{target}; # optional
-	if (defined($online_node_usage->{$sd->{node}})) {
+	if ($online_node_usage->contains_node($sd->{node})) {
 	    if (
 		$state eq 'started' || $state eq 'request_stop' || $state eq 'fence' ||
 		$state eq 'freeze' || $state eq 'error' || $state eq 'recovery'
 	    ) {
-		$online_node_usage->{$sd->{node}}++;
+		$online_node_usage->add_service_usage_to_node($sd->{node}, $sid, $sd->{node});
 	    } elsif (($state eq 'migrate') || ($state eq 'relocate')) {
+		my $source = $sd->{node};
 		# count it for both, source and target as load is put on both
-		$online_node_usage->{$sd->{node}}++;
-		$online_node_usage->{$target}++;
+		$online_node_usage->add_service_usage_to_node($source, $sid, $source, $target);
+		$online_node_usage->add_service_usage_to_node($target, $sid, $source, $target);
 	    } elsif ($state eq 'stopped') {
 		# do nothing
 	    } else {
 		die "should not be reached (sid = '$sid', state = '$state')";
 	    }
-	} elsif (defined($target) && defined($online_node_usage->{$target})) {
+	} elsif (defined($target) && $online_node_usage->contains_node($target)) {
 	    if ($state eq 'migrate' || $state eq 'relocate') {
 		# to correctly track maintenance modi and also consider the target as used for the
 		# case a node dies, as we cannot really know if the to-be-aborted incoming migration
 		# has already cleaned up all used resources
-		$online_node_usage->{$target}++;
+		$online_node_usage->add_service_usage_to_node($target, $sid, $sd->{node}, $target);
 	    }
 	}
     }
@@ -775,7 +774,7 @@ sub next_state_started {
 	    );
 
 	    if ($node && ($sd->{node} ne $node)) {
-		$self->{online_node_usage}->{$node}++;
+		$self->{online_node_usage}->add_service_usage_to_node($node, $sid, $sd->{node});
 
 		if (defined(my $fallback = $sd->{maintenance_node})) {
 		    if ($node eq $fallback) {
@@ -864,7 +863,7 @@ sub next_state_recovery {
 	$fence_recovery_cleanup->($self, $sid, $fenced_node);
 
 	$haenv->steal_service($sid, $sd->{node}, $recovery_node);
-	$self->{online_node_usage}->{$recovery_node}++;
+	$self->{online_node_usage}->add_service_usage_to_node($recovery_node, $sid, $recovery_node);
 
 	# NOTE: $sd *is normally read-only*, fencing is the exception
 	$cd->{node} = $sd->{node} = $recovery_node;
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index f11d1a6..308eab3 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -6,6 +6,7 @@ use warnings;
 use lib '..';
 use PVE::HA::Groups;
 use PVE::HA::Manager;
+use PVE::HA::Usage::Basic;
 
 my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
 group: prefer_node1
@@ -13,11 +14,11 @@ group: prefer_node1
 EOD
 
 
-my $online_node_usage = {
-    node1 => 0,
-    node2 => 0,
-    node3 => 0,
-};
+# Relies on the fact that the basic plugin doesn't use the haenv.
+my $online_node_usage = PVE::HA::Usage::Basic->new();
+$online_node_usage->add_node("node1");
+$online_node_usage->add_node("node2");
+$online_node_usage->add_node("node3");
 
 my $service_conf = {
     node => 'node1',
@@ -43,22 +44,22 @@ sub test {
 test('node1');
 test('node1', 1);
 
-delete $online_node_usage->{node1}; # poweroff
+$online_node_usage->remove_node("node1"); # poweroff
 
 test('node2');
 test('node3', 1);
 test('node2', 1);
 
-delete $online_node_usage->{node2}; # poweroff
+$online_node_usage->remove_node("node2"); # poweroff
 
 test('node3');
 test('node3', 1);
 
-$online_node_usage->{node1} = 0; # poweron
+$online_node_usage->add_node("node1"); # poweron
 
 test('node1');
 
-$online_node_usage->{node2} = 0; # poweron
+$online_node_usage->add_node("node2"); # poweron
 
 test('node1');
 test('node1', 1);
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (13 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 05/11] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-15 15:55   ` DERUMIER, Alexandre
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method Fiona Ebner
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

for calculating node usage of services based upon static CPU and
memory configuration as well as scoring the nodes with that
information to decide where to start a new or recovered service.

For getting the service stats, it's necessary to also consider the
migration target (if present), becuase the configuration file might
have already moved.

It's necessary to update the cluster filesystem upon stealing the
service to be able to always read the moved config right away when
adding the usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

For add_service_usage_to_node(), not sure if the callers should rather
handle the error. But I'd make them just do the same, i.e. log warning
and continue.

 debian/pve-ha-manager.install |   1 +
 src/PVE/HA/Env/PVE2.pm        |   4 ++
 src/PVE/HA/Usage.pm           |   1 +
 src/PVE/HA/Usage/Makefile     |   2 +-
 src/PVE/HA/Usage/Static.pm    | 114 ++++++++++++++++++++++++++++++++++
 5 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage/Static.pm

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 87fb24c..a7598a9 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -35,5 +35,6 @@
 /usr/share/perl5/PVE/HA/Tools.pm
 /usr/share/perl5/PVE/HA/Usage.pm
 /usr/share/perl5/PVE/HA/Usage/Basic.pm
+/usr/share/perl5/PVE/HA/Usage/Static.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7cecf35..7fac43c 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -176,6 +176,10 @@ sub steal_service {
     } else {
 	die "implement me";
     }
+
+    # Necessary for (at least) static usage plugin to always be able to read service config from new
+    # node right away.
+    $self->cluster_state_update();
 }
 
 sub read_group_config {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 4c723d1..66d9572 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,6 +33,7 @@ sub contains_node {
     die "implement in subclass";
 }
 
+# Logs a warning to $haenv upon failure, but does not die.
 sub add_service_usage_to_node {
     my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
 
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index ccf1282..5a51359 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Basic.pm
+SOURCES=Basic.pm Static.pm
 
 .PHONY: install
 install:
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
new file mode 100644
index 0000000..78883aa
--- /dev/null
+++ b/src/PVE/HA/Usage/Static.pm
@@ -0,0 +1,114 @@
+package PVE::HA::Usage::Static;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Static;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    my $node_stats = eval { $haenv->get_static_node_stats() };
+    die "did not get static node usage information - $@" if $@;
+
+    my $scheduler = eval { PVE::RS::ResourceScheduling::Static->new(); };
+    die "unable to initialize static scheduling - $@" if $@;
+
+    return bless {
+	'node-stats' => $node_stats,
+	'service-stats' => {},
+	haenv => $haenv,
+	scheduler => $scheduler,
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    my $stats = $self->{'node-stats'}->{$nodename}
+	or die "did not get static node usage information for '$nodename'\n";
+    die "static node usage information for '$nodename' missing cpu count\n" if !$stats->{cpus};
+    die "static node usage information for '$nodename' missing memory\n" if !$stats->{memory};
+
+    eval { $self->{scheduler}->add_node($nodename, int($stats->{cpus}), int($stats->{memory})); };
+    die "initializing static node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+    my ($self, $sid, $service_node, $migration_target) = @_;
+
+    return $self->{'service-stats'}->{$sid} if $self->{'service-stats'}->{$sid};
+
+    my (undef, $type, $id) = $self->{haenv}->parse_sid($sid);
+    my $plugin = PVE::HA::Resources->lookup($type);
+
+    my $stats = eval { $plugin->get_static_stats($id, $service_node); };
+    if (my $err = $@) {
+	# config might've already moved during a migration
+	$stats = eval { $plugin->get_static_stats($id, $migration_target); } if $migration_target;
+	die "did not get static service usage information for '$sid' - $err\n" if !$stats;
+    }
+
+    my $service_stats = {
+	maxcpu => $stats->{maxcpu} + 0.0, # containers allow non-integer cpulimit
+	maxmem => int($stats->{maxmem}),
+    };
+
+    $self->{'service-stats'}->{$sid} = $service_stats;
+
+    return $service_stats;
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    eval {
+	my $service_usage = get_service_usage($self, $sid, $service_node, $migration_target);
+	$self->{scheduler}->add_service_usage_to_node($nodename, $service_usage);
+    };
+    $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
+	if $@;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    my $score_list = eval {
+	my $service_usage = get_service_usage($self, $sid, $service_node);
+	$self->{scheduler}->score_nodes_to_start_service($service_usage);
+    };
+    if (my $err = $@) {
+	$self->{haenv}->log(
+	    'err',
+	    "unable to score nodes according to static usage for service '$sid' - $err",
+	);
+	# TODO maybe use service count as fallback?
+	return { map { $_ => 1 } $self->list_nodes() };
+    }
+
+    # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+    return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (14 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-16  7:05   ` Thomas Lamprecht
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 08/11] manager: set resource scheduler mode upon init Fiona Ebner
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

for reading the resource scheduler settings.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Env.pm      |  7 +++++++
 src/PVE/HA/Env/PVE2.pm | 12 ++++++++++++
 src/PVE/HA/Sim/Env.pm  |  9 +++++++++
 3 files changed, 28 insertions(+)

diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 00e3e3c..c014ff7 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -269,6 +269,13 @@ sub get_ha_settings {
     return $self->{plug}->get_ha_settings();
 }
 
+# return cluster wide resource scheduling settings
+sub get_crs_settings {
+    my ($self) = @_;
+
+    return $self->{plug}->get_crs_settings();
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7fac43c..e8f746c 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -464,6 +464,18 @@ sub get_ha_settings {
     return $datacenterconfig->{ha};
 }
 
+sub get_crs_settings {
+    my ($self) = @_;
+
+    my $datacenterconfig = eval { cfs_read_file('datacenter.cfg') };
+    if (my $err = $@) {
+	$self->log('err', "unable to get CRS settings from datacenter.cfg - $err");
+	return {};
+    }
+
+    return $datacenterconfig->{crs};
+}
+
 sub get_static_node_stats {
     my ($self) = @_;
 
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index b286708..967c031 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -433,4 +433,13 @@ sub get_ha_settings {
     return $datacenterconfig->{ha};
 }
 
+# return cluster wide resource scheduling settings
+sub get_crs_settings {
+    my ($self) = @_;
+
+    my $datacenterconfig = $self->{hardware}->read_datacenter_conf();
+
+    return $datacenterconfig->{crs};
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 08/11] manager: set resource scheduler mode upon init
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (15 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured Fiona Ebner
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Manager.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63e6c8a..20b743c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -52,6 +52,11 @@ sub new {
 
     $self->{ms} = { master_node => $haenv->nodename() };
 
+    my $crs_settings = $haenv->get_crs_settings();
+    $self->{'scheduler-mode'} = $crs_settings ? $crs_settings->{ha} : 'basic';
+    $haenv->log('info', "using scheduler mode '$self->{'scheduler-mode'}'")
+	if $self->{'scheduler-mode'} ne 'basic';
+
     return $self;
 }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (16 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 08/11] manager: set resource scheduler mode upon init Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-11  9:28   ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 10/11] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Manager.pm | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 20b743c..7c85b67 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,7 @@ use PVE::Tools;
 use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
 use PVE::HA::Usage::Basic;
+use PVE::HA::Usage::Static;
 
 ## Variable Name & Abbreviations Convention
 #
@@ -206,11 +207,30 @@ my $valid_service_states = {
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
+    my $haenv = $self->{haenv};
 
     my $online_nodes = $self->{ns}->list_online_nodes();
 
-    $online_node_usage->add_node($_) for $online_nodes->@*;
+    my $online_node_usage;
+
+    if (my $mode = $self->{'scheduler-mode'}) {
+	if ($mode eq 'static') {
+	    $online_node_usage = eval {
+		my $scheduler = PVE::HA::Usage::Static->new($haenv);
+		$scheduler->add_node($_) for $online_nodes->@*;
+		return $scheduler;
+	    };
+	    $haenv->log('warning', "using 'basic' scheduler mode, init for 'static' failed - $@")
+		if $@;
+	} elsif ($mode ne 'basic') {
+	    $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'");
+	}
+    }
+
+    if (!$online_node_usage) {
+	$online_node_usage = PVE::HA::Usage::Basic->new($haenv);
+	$online_node_usage->add_node($_) for $online_nodes->@*;
+    }
 
     foreach my $sid (keys %{$self->{ss}}) {
 	my $sd = $self->{ss}->{$sid};
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 10/11] manager: avoid scoring nodes if maintenance fallback node is valid
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (17 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 11/11] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/HA/Manager.pm | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 7c85b67..b94ba9d 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,25 +149,20 @@ sub select_service_node {
 	}
     }
 
+    return $maintenance_fallback
+	if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+
     my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
 	$scores->{$a} <=> $scores->{$b} || $a cmp $b
     } keys %{$pri_groups->{$top_pri}};
 
     my $found;
-    my $found_maintenance_fallback;
     for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
 	my $node = $nodes[$i];
 	if ($node eq $current_node) {
 	    $found = $i;
 	}
-	if (defined($maintenance_fallback) && $node eq $maintenance_fallback) {
-	    $found_maintenance_fallback = $i;
-	}
-    }
-
-    if (defined($found_maintenance_fallback)) {
-	return $nodes[$found_maintenance_fallback];
     }
 
     if ($try_next) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH ha-manager 11/11] manager: avoid scoring nodes when not trying next and current node is valid
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (18 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 10/11] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
@ 2022-11-10 14:37 ` Fiona Ebner
  2022-11-10 14:38 ` [pve-devel] [PATCH docs 1/1] ha: add section about scheduler modes Fiona Ebner
  2022-11-15 13:12 ` [pve-devel] partially-applied: [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Thomas Lamprecht
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:37 UTC (permalink / raw)
  To: pve-devel

With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

This should cover most calls of select_service_node().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

I think the whole $found logic can be dropped *if* we ensure that
$tried_nodes and $try_next go hand in hand, because all the tried
nodes are already not considered as candidates anymore. But this has
to be evaluated more carefully and is something for after the release
of course :) The test_failover1.pl test doesn't set failed nodes at
all and would be the only one to break.

 src/PVE/HA/Manager.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index b94ba9d..9d36ea8 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -152,6 +152,8 @@ sub select_service_node {
     return $maintenance_fallback
 	if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
 
+    return $current_node if !$try_next && $pri_groups->{$top_pri}->{$current_node};
+
     my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
 	$scores->{$a} <=> $scores->{$b} || $a cmp $b
@@ -171,8 +173,6 @@ sub select_service_node {
 	} else {
 	    return $nodes[0];
 	}
-    } elsif (defined($found)) {
-	return $nodes[$found];
     } else {
 	return $nodes[0];
     }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] [PATCH docs 1/1] ha: add section about scheduler modes
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (19 preceding siblings ...)
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 11/11] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
@ 2022-11-10 14:38 ` Fiona Ebner
  2022-11-15 13:12 ` [pve-devel] partially-applied: [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Thomas Lamprecht
  21 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-10 14:38 UTC (permalink / raw)
  To: pve-devel

briefly describing the 'basic' and 'static' modes and with a note
mentioning plans for balancers.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 ha-manager.adoc | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/ha-manager.adoc b/ha-manager.adoc
index 54db2a5..dfed393 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -933,6 +933,33 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or
 immediate node reboot or even reset.
 
 
+Scheduler Mode
+--------------
+
+The scheduler mode controls how HA selects nodes during recovery of a service.
+The default mode is `basic`, you can change it in `datacenter.cfg`:
+
+----
+crs: ha=static
+----
+
+For each service that needs to be recovered, the scheduler iteratively chooses
+the best node among the nodes with the highest priority in the service's group.
+
+NOTE: There are plans to add modes for (static and dynamic) load-balancing in
+the future.
+
+Basic
+^^^^^
+
+The number of active HA serivces on each node is used to choose a recovery node.
+
+Static
+^^^^^^
+
+Static usage information from HA serivces on each node is used to choose a
+recovery node. In particular, CPU and memory configuration is considered.
+
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
 endif::manvolnum[]
-- 
2.30.2





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured Fiona Ebner
@ 2022-11-11  9:28   ` Fiona Ebner
  2022-11-16  7:14     ` Thomas Lamprecht
  0 siblings, 1 reply; 41+ messages in thread
From: Fiona Ebner @ 2022-11-11  9:28 UTC (permalink / raw)
  To: pve-devel

Am 10.11.22 um 15:37 schrieb Fiona Ebner:
> @@ -206,11 +207,30 @@ my $valid_service_states = {
>  sub recompute_online_node_usage {
So I was a bit worried that recompute_online_node_usage() would become
too inefficient with the new add_service_usage_to_node() overhead from
needing to read the guest configs. I now tested it with ~300 HA services
(minimal containers) running on my virtual test cluster.

Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds

While about a 10-fold increase, it's not too dramatic at least. I guess
that's what the caching of cfs files is for :)

Still, the function is currently not only called in the main loop in
manage(), but also in next_state_recovery() and change_service_state().

With, say, 400 HA services each on 5 nodes, if a node fails there's
400 calls from changing to freeze
400 calls from changing to recovery
400 calls in next_state_recovery
400 calls from changing to started
If we take a generous estimate that each call takes 0.1 seconds (there's
2000 services in total), that's 40+80+40 seconds in 3 bursts during the
fencing and recovery period.

Is that acceptable? Should I try to optimize how often the function is
called?




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] applied: [PATCH proxmox-resource-scheduling 1/3] initial commit
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
@ 2022-11-15 10:15   ` Wolfgang Bumiller
  2022-11-15 15:39   ` [pve-devel] " DERUMIER, Alexandre
  1 sibling, 0 replies; 41+ messages in thread
From: Wolfgang Bumiller @ 2022-11-15 10:15 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

repo created & packaged, thanks




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] applied-series: [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module Fiona Ebner
@ 2022-11-15 10:16   ` Wolfgang Bumiller
  0 siblings, 0 replies; 41+ messages in thread
From: Wolfgang Bumiller @ 2022-11-15 10:16 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: pve-devel

applied & bumped, thanks
will take a look at how to get the tests runnable better




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] partially-applied: [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager
  2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
                   ` (20 preceding siblings ...)
  2022-11-10 14:38 ` [pve-devel] [PATCH docs 1/1] ha: add section about scheduler modes Fiona Ebner
@ 2022-11-15 13:12 ` Thomas Lamprecht
  21 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-15 13:12 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> pve-manager:
> 
> Fiona Ebner (3):
>   pvestatd: broadcast static node information
>   cluster resources: add cgroup-mode to node properties
>   ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version

applied these three




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method Fiona Ebner
@ 2022-11-15 13:28   ` Thomas Lamprecht
  2022-11-16  8:46     ` Fiona Ebner
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-15 13:28 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> to be used for static resource scheduling.
> 
> In container's vmstatus(), the 'cores' option takes precedence over
> the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit'
> here.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  src/PVE/HA/Resources.pm       |  5 +++++
>  src/PVE/HA/Resources/PVECT.pm | 11 +++++++++++
>  src/PVE/HA/Resources/PVEVM.pm | 14 ++++++++++++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
> index 835c314..f7eaff2 100644
> --- a/src/PVE/HA/Resources.pm
> +++ b/src/PVE/HA/Resources.pm
> @@ -161,6 +161,11 @@ sub remove_locks {
>      die "implement in subclass";
>  }
>  
> +sub get_static_stats {
> +    my ($class, $id, $service_node) = @_;
> +
> +    die "implement in subclass";
> +}
>  
>  # package PVE::HA::Resources::IPAddr;
>  
> diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
> index 015faf3..8209d9c 100644
> --- a/src/PVE/HA/Resources/PVECT.pm
> +++ b/src/PVE/HA/Resources/PVECT.pm
> @@ -150,4 +150,15 @@ sub remove_locks {
>      return undef;
>  }
>  
> +sub get_static_stats {
> +    my ($class, $id, $service_node) = @_;
> +
> +    my $conf = PVE::LXC::Config->load_config($id, $service_node);
> +
> +    return {
> +	maxcpu => $conf->{cpulimit} || $conf->{cores} || 0,
> +	maxmem => ($conf->{memory} || 512) * 1024 * 1024,

nit, would prefer one of snake or kebab case for those keys (no hard feeling on which
one, should just be consistent within similar (i.e., the stats) stuff.

> +    };
> +}
> +
>  1;
> diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
> index 58c83e0..85196ed 100644
> --- a/src/PVE/HA/Resources/PVEVM.pm
> +++ b/src/PVE/HA/Resources/PVEVM.pm
> @@ -173,4 +173,18 @@ sub remove_locks {
>      return undef;
>  }
>  
> +sub get_static_stats {
> +    my ($class, $id, $service_node) = @_;
> +
> +    my $conf = PVE::QemuConfig->load_config($id, $service_node);

maybe it could be worth to use the CFS_IPC_GET_GUEST_CONFIG_PROPERTY successor that
Dominik developed for the tags stuff once applied for this, can be still switched too
transparently, if this shows to be a bottleneck in the future though, so just mentioning
for completeness sake.

> +    my $defaults = PVE::QemuServer::load_defaults();
> +
> +    my $cpus = ($conf->{sockets} || $defaults->{sockets}) * ($conf->{cores} || $defaults->{cores});
> +
> +    return {
> +	maxcpu => $conf->{vcpus} || $cpus,
> +	maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,

same here. As this is just internal we can also adapt it later though..

> +    };
> +}
> +
>  1;





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
  2022-11-15 10:15   ` [pve-devel] applied: " Wolfgang Bumiller
@ 2022-11-15 15:39   ` DERUMIER, Alexandre
  2022-11-16  9:09     ` Fiona Ebner
  1 sibling, 1 reply; 41+ messages in thread
From: DERUMIER, Alexandre @ 2022-11-15 15:39 UTC (permalink / raw)
  To: pve-devel

Thanks Fiona for your hard work on this !

I'm going to review/test them this week.

I'm  not an expert in Rust, but I think I'll be able to read the code


Just a question, how do you choose the weight of different criteria ?

I think in the second patch, I see:

+    static ref PVE_HA_TOPSIS_CRITERIA: TopsisCriteria<N_CRITERIA> =
TopsisCriteria::new([
+        TopsisCriterion::new("average CPU".to_string(), -1.0),
+        TopsisCriterion::new("highest CPU".to_string(), -2.0),
+        TopsisCriterion::new("average memory".to_string(), -5.0),
+        TopsisCriterion::new("highest memory".to_string(), -10.0),
+    ])


Is is arbitrary values ?

if you look at my previous patch series,
https://lists.proxmox.com/pipermail/pve-devel/2022-April/052779.html

I have also implement the AHP algo. (from AHP-TOPSIS), which is really
usefull to find weights for criteria, when you begin to have a lot of
criterias, giving priority in a matrix between each criterias.

here a youtube video about the math:
https://www.youtube.com/watch?v=J4T70o8gjlk&t=456s


(I had implemented AHP to dynamic find the weights on service start, we
the weight could be compute once, and set statically)


Le jeudi 10 novembre 2022 à 15:37 +0100, Fiona Ebner a écrit :
> Implement the TOPSIS[0] algorithm to score multi-valued alternatives
> according to a given set of weighted criteria.
> 
> The number of alternatives cannot be known at compile time, but the
> number of criteria should be (a given module using the topsis module
> should have one (or more) fixed sets of criteria). Therefore, the
> TopsisMatrix is implemented as a Vec of N_CRITERIA-sized arrays.
> 
> Compared to the description in [0] the weighing of the matrix
> according to the weights of the criteria only happens during distance
> calculation to the idealized alternatives. It turned out more natural
> like that, because the matrix doesn't need to be mutable.
> 
>  


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin Fiona Ebner
@ 2022-11-15 15:55   ` DERUMIER, Alexandre
  2022-11-16  9:10     ` Fiona Ebner
  0 siblings, 1 reply; 41+ messages in thread
From: DERUMIER, Alexandre @ 2022-11-15 15:55 UTC (permalink / raw)
  To: pve-devel


> +sub score_nodes_to_start_service {
> +    my ($self, $sid, $service_node) = @_;
> +
> +    my $score_list = eval {
> +       my $service_usage = get_service_usage($self, $sid,
> $service_node);
> +       $self->{scheduler}-
> >score_nodes_to_start_service($service_usage);
> +    };
> +    if (my $err = $@) {
> +       $self->{haenv}->log(
> +           'err',
> +           "unable to score nodes according to static usage for
> service '$sid' - $err",
> +       );
> +       # TODO maybe use service count as fallback?

Maybe you use add "service count" in criterias with the lowest priority
?

(I have done it like this if I remember)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method Fiona Ebner
@ 2022-11-16  7:05   ` Thomas Lamprecht
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16  7:05 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> for reading the resource scheduler settings.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  src/PVE/HA/Env.pm      |  7 +++++++
>  src/PVE/HA/Env/PVE2.pm | 12 ++++++++++++
>  src/PVE/HA/Sim/Env.pm  |  9 +++++++++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
> index 00e3e3c..c014ff7 100644
> --- a/src/PVE/HA/Env.pm
> +++ b/src/PVE/HA/Env.pm
> @@ -269,6 +269,13 @@ sub get_ha_settings {
>      return $self->{plug}->get_ha_settings();
>  }
>  
> +# return cluster wide resource scheduling settings
> +sub get_crs_settings {
> +    my ($self) = @_;
> +
> +    return $self->{plug}->get_crs_settings();
> +}
> +

we try to keep the Env footprint on the smaller side, if possible; I'd rather add
this to get_ha_settings, either as tuple or from a gut feeling maybe better as hash

To clarify on that we could rename it to get_datacenter_settings first, I'd still
limit it to the relevant ones to avoid info "leakage" that some future patch misuses
then so subtly that we don't notice.


It'd then return:

{
   ha => {}
   crs => {}
}

fwiw, moving in max_workers would be an option then too, but no need for that in
this series.

maybe throw then also a comment in that changes are only to be taken in between new
LRM rounds or new Manager creation (due to no active one) for stability purpose.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured
  2022-11-11  9:28   ` Fiona Ebner
@ 2022-11-16  7:14     ` Thomas Lamprecht
  2022-11-16  9:37       ` Fiona Ebner
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16  7:14 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 11/11/2022 um 10:28 schrieb Fiona Ebner:
> Am 10.11.22 um 15:37 schrieb Fiona Ebner:
>> @@ -206,11 +207,30 @@ my $valid_service_states = {
>>  sub recompute_online_node_usage {
> So I was a bit worried that recompute_online_node_usage() would become
> too inefficient with the new add_service_usage_to_node() overhead from
> needing to read the guest configs. I now tested it with ~300 HA services
> (minimal containers) running on my virtual test cluster.
> 
> Timings with 'basic' mode were between 0.0004 - 0.001 seconds
> Timings with 'static' mode were between 0.007 - 0.012 seconds
> 
> While about a 10-fold increase, it's not too dramatic at least. I guess
> that's what the caching of cfs files is for :)
> 
> Still, the function is currently not only called in the main loop in
> manage(), but also in next_state_recovery() and change_service_state().
> 
> With, say, 400 HA services each on 5 nodes, if a node fails there's
> 400 calls from changing to freeze

huh, freeze should only happen on graceful shutdown of a node, not
if it fails?

> 400 calls from changing to recovery
> 400 calls in next_state_recovery
> 400 calls from changing to started
> If we take a generous estimate that each call takes 0.1 seconds (there's
> 2000 services in total), that's 40+80+40 seconds in 3 bursts during the
> fencing and recovery period.

doesn't that lead to overly long run windows between watchdog updates?

> 
> Is that acceptable? Should I try to optimize how often the function is
> called?
> 

hmm, a quick look wouldn't hurt, but not required for now IMO - if it can
interfere with watchdog updates I'd sneak in updating it once in between
though.


ps. maybe you can have some of that info/stats here in the commit message
of this patch.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters
  2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters Fiona Ebner
@ 2022-11-16  7:17   ` Thomas Lamprecht
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16  7:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> In preparation for scheduling based on static information, where the
> scoring of nodes depends on information from the service's
> VM/CT configuration file (and the $sid is required to query that).
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  src/PVE/HA/Manager.pm      | 4 +++-
>  src/test/test_failover1.pl | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 518f64f..63c94af 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -119,7 +119,7 @@ sub get_node_priority_groups {
>  }
>  
>  sub select_service_node {
> -    my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
> +    my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;

ok for now, but we seriously need to clean this method signature up, as idea for
some future patch:

$service -> { sid => .., conf => .., current_node => ... }
$affinity -> { groups => ..., usage => ..., tried_nodes => .., maintenance_fallback }

(but not $try_next, that is no info struct but a control flag to the "algorithm
behavior", so it needs to stay separate).

May then also be sensible to adapt the way we save that info in the manager $self,
but not all change at once :)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-15 13:28   ` Thomas Lamprecht
@ 2022-11-16  8:46     ` Fiona Ebner
  2022-11-16  8:59       ` Thomas Lamprecht
  2022-11-16 12:38       ` DERUMIER, Alexandre
  0 siblings, 2 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-16  8:46 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

Am 15.11.22 um 14:28 schrieb Thomas Lamprecht:
> Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
>> +    return {
>> +	maxcpu => $conf->{cpulimit} || $conf->{cores} || 0,
>> +	maxmem => ($conf->{memory} || 512) * 1024 * 1024,
> 
> nit, would prefer one of snake or kebab case for those keys (no hard feeling on which
> one, should just be consistent within similar (i.e., the stats) stuff.
> 
>> +    };
>> +}
>> +
>>  1;
>> diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
>> index 58c83e0..85196ed 100644
>> --- a/src/PVE/HA/Resources/PVEVM.pm
>> +++ b/src/PVE/HA/Resources/PVEVM.pm
>> @@ -173,4 +173,18 @@ sub remove_locks {
>>      return undef;
>>  }
>>  
>> +sub get_static_stats {
>> +    my ($class, $id, $service_node) = @_;
>> +
>> +    my $conf = PVE::QemuConfig->load_config($id, $service_node);
> 
> maybe it could be worth to use the CFS_IPC_GET_GUEST_CONFIG_PROPERTY successor that
> Dominik developed for the tags stuff once applied for this, can be still switched too
> transparently, if this shows to be a bottleneck in the future though, so just mentioning
> for completeness sake.
> 

Sure, will be worth a test!

>> +    my $defaults = PVE::QemuServer::load_defaults();
>> +
>> +    my $cpus = ($conf->{sockets} || $defaults->{sockets}) * ($conf->{cores} || $defaults->{cores});
>> +
>> +    return {
>> +	maxcpu => $conf->{vcpus} || $cpus,
>> +	maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,
> 
> same here. As this is just internal we can also adapt it later though..
> 

Well, the Rust backend also uses 'maxcpu' and 'maxmem' currently :/
So at least in Usage/Static.pm, it will be more difficult to change later.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-16  8:46     ` Fiona Ebner
@ 2022-11-16  8:59       ` Thomas Lamprecht
  2022-11-16 12:38       ` DERUMIER, Alexandre
  1 sibling, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16  8:59 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 16/11/2022 um 09:46 schrieb Fiona Ebner:
>>> +    return {
>>> +	maxcpu => $conf->{vcpus} || $cpus,
>>> +	maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,
>> same here. As this is just internal we can also adapt it later though..
>>
> Well, the Rust backend also uses 'maxcpu' and 'maxmem' currently :/
> So at least in Usage/Static.pm, it will be more difficult to change later.

hmm then we should switch now (over never speak of this again ;), I'll survive
if it stays that way too, just mentioned it due to finding it harder to read
without real benefit.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit
  2022-11-15 15:39   ` [pve-devel] " DERUMIER, Alexandre
@ 2022-11-16  9:09     ` Fiona Ebner
  0 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-16  9:09 UTC (permalink / raw)
  To: pve-devel, DERUMIER, Alexandre

Am 15.11.22 um 16:39 schrieb DERUMIER, Alexandre:
> Thanks Fiona for your hard work on this !
> 
> I'm going to review/test them this week.
> 
> I'm  not an expert in Rust, but I think I'll be able to read the code
> 

Thank you for pushing the idea in the first place! Let's hope we can
re-use this infrastructure to get to a real dynamic balancer down the
line :)

> 
> Just a question, how do you choose the weight of different criteria ?
> 
> I think in the second patch, I see:
> 
> +    static ref PVE_HA_TOPSIS_CRITERIA: TopsisCriteria<N_CRITERIA> =
> TopsisCriteria::new([
> +        TopsisCriterion::new("average CPU".to_string(), -1.0),
> +        TopsisCriterion::new("highest CPU".to_string(), -2.0),
> +        TopsisCriterion::new("average memory".to_string(), -5.0),
> +        TopsisCriterion::new("highest memory".to_string(), -10.0),
> +    ])
> 
> 
> Is is arbitrary values ?

Yes, it's arbitrary. Thomas suggested that memory should be much more
important, because it's "hard" limited compared to CPU time. Average
will mostly be relevant when there is already a highly-commited node.
Not overcommitting nodes in the first place should take precedence over
that of course, so highest is more important than average.

> 
> if you look at my previous patch series,
> https://lists.proxmox.com/pipermail/pve-devel/2022-April/052779.html
> 
> I have also implement the AHP algo. (from AHP-TOPSIS), which is really
> usefull to find weights for criteria, when you begin to have a lot of
> criterias, giving priority in a matrix between each criterias.
> 
> here a youtube video about the math:
> https://www.youtube.com/watch?v=J4T70o8gjlk&t=456s
> 
> 
> (I had implemented AHP to dynamic find the weights on service start, we
> the weight could be compute once, and set statically)

Yes, I mention this in the cover letter. We can still implement it later
if we want to. It's true that with AHP you only have to choose pairwise
weights, but in a way it doesn't make the task fundamentally easier,
because you have to guess n(n-1)/2 arbitrary values rather than n
arbitrary values ;)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin
  2022-11-15 15:55   ` DERUMIER, Alexandre
@ 2022-11-16  9:10     ` Fiona Ebner
  0 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-16  9:10 UTC (permalink / raw)
  To: pve-devel, DERUMIER, Alexandre

Am 15.11.22 um 16:55 schrieb DERUMIER, Alexandre:
> 
>> +sub score_nodes_to_start_service {
>> +    my ($self, $sid, $service_node) = @_;
>> +
>> +    my $score_list = eval {
>> +       my $service_usage = get_service_usage($self, $sid,
>> $service_node);
>> +       $self->{scheduler}-
>>> score_nodes_to_start_service($service_usage);
>> +    };
>> +    if (my $err = $@) {
>> +       $self->{haenv}->log(
>> +           'err',
>> +           "unable to score nodes according to static usage for
>> service '$sid' - $err",
>> +       );
>> +       # TODO maybe use service count as fallback?
> 
> Maybe you use add "service count" in criterias with the lowest priority
> ?
> 
> (I have done it like this if I remember)
> 

The fallback is intended to be used when something goes wrong with the
calculation or interfacing with the Rust backend. So we can't add the
fallback in the Rust backend, but would need to do it in the Perl module
directly.

The TOPSIS calculation is only expected to fail if a f64::NAN ends up in
the TOPSIS Matrix somehow, which also shouldn't happen in absence of
bugs. But it's good to be prepared for unexpected bugs in the
interfacing and calculation of course.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module
  2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module Fiona Ebner
@ 2022-11-16  9:18   ` Thomas Lamprecht
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16  9:18 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> Models usage of guests and nodes, and allows scoring nodes on which to
> start a new service via TOPSIS. For this scoring, each node in turn is
> considered as if the service was already running on it.
>
> CPU and memory usage are used as criteria, with memory being weighted
> much more, because it's a truly limited resource. For both, CPU and
> memory, highest usage among nodes (weighted more, as ideally no node
> should be overcommited) and average usage of all nodes (to still be

overcommitted

> able to distinguish in case there already is a more highly commited

committed

> node) are considered.

something like this would be also great to have in the docs, as it gives
some relatively high level (thus user digestible) insight.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured
  2022-11-16  7:14     ` Thomas Lamprecht
@ 2022-11-16  9:37       ` Fiona Ebner
  0 siblings, 0 replies; 41+ messages in thread
From: Fiona Ebner @ 2022-11-16  9:37 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

Am 16.11.22 um 08:14 schrieb Thomas Lamprecht:
> Am 11/11/2022 um 10:28 schrieb Fiona Ebner:
>> Am 10.11.22 um 15:37 schrieb Fiona Ebner:
>>> @@ -206,11 +207,30 @@ my $valid_service_states = {
>>>  sub recompute_online_node_usage {
>> So I was a bit worried that recompute_online_node_usage() would become
>> too inefficient with the new add_service_usage_to_node() overhead from
>> needing to read the guest configs. I now tested it with ~300 HA services
>> (minimal containers) running on my virtual test cluster.
>>
>> Timings with 'basic' mode were between 0.0004 - 0.001 seconds
>> Timings with 'static' mode were between 0.007 - 0.012 seconds
>>
>> While about a 10-fold increase, it's not too dramatic at least. I guess
>> that's what the caching of cfs files is for :)
>>
>> Still, the function is currently not only called in the main loop in
>> manage(), but also in next_state_recovery() and change_service_state().
>>
>> With, say, 400 HA services each on 5 nodes, if a node fails there's
>> 400 calls from changing to freeze
> 
> huh, freeze should only happen on graceful shutdown of a node, not
> if it fails?

Sorry, I meant fence not freeze.

> 
>> 400 calls from changing to recovery
>> 400 calls in next_state_recovery
>> 400 calls from changing to started
>> If we take a generous estimate that each call takes 0.1 seconds (there's
>> 2000 services in total), that's 40+80+40 seconds in 3 bursts during the
>> fencing and recovery period.
> 
> doesn't that lead to overly long run windows between watchdog updates?
> 
>>
>> Is that acceptable? Should I try to optimize how often the function is
>> called?
>>
> 
> hmm, a quick look wouldn't hurt, but not required for now IMO - if it can
> interfere with watchdog updates I'd sneak in updating it once in between
> though.
> 

Yes, from a quick look that might become a problem, exactly because the
delays happen in bursts (all services change state in a single manage()
run).

Not sure how you would trigger the update, because that would need to
happen in the CRM AFAIU?

There is a fixme comment in CRM.pm's work() to set an alert timer and
enforce working for at most $max_time seconds. That would of course help
here.

Getting rid of superfluous recompute_online_node_usage() calls should
also not be impossible. We'd need to ensure that we add service usage
(that already is done in recovery and next_state_started) and remove
service usage (removing is not implemented right now) when changing
nodes or states. Then it'd be enough to call
recompute_online_node_usage() once per cycle and it'd be a huge
improvement compared to now. Additionally, we could call it whenever we
iterated a certain number of services, just to be sure.

> 
> ps. maybe you can have some of that info/stats here in the commit message
> of this patch.

Sure.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-16  8:46     ` Fiona Ebner
  2022-11-16  8:59       ` Thomas Lamprecht
@ 2022-11-16 12:38       ` DERUMIER, Alexandre
  2022-11-16 12:52         ` Thomas Lamprecht
  1 sibling, 1 reply; 41+ messages in thread
From: DERUMIER, Alexandre @ 2022-11-16 12:38 UTC (permalink / raw)
  To: pve-devel, t.lamprecht

> > 
> > same here. As this is just internal we can also adapt it later
> > though..
> > 
> 
> Well, the Rust backend also uses 'maxcpu' and 'maxmem' currently :/
> So at least in Usage/Static.pm, it will be more difficult to change
> later.
> 

Hi,

I still have a virtio-mem pending patch series,
where I introduce a real maxmemory option in config file.
https://www.mail-archive.com/pve-devel@lists.proxmox.com/msg09395.html


I think we should use same variable than in config files to avoid
confusion.

maxmemory: the max memory we can hotplug when virtio-mem is used
memory: the current online memory module
balloon: the min size we can balloon.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method
  2022-11-16 12:38       ` DERUMIER, Alexandre
@ 2022-11-16 12:52         ` Thomas Lamprecht
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-16 12:52 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel

Hi,

Am 16/11/2022 um 13:38 schrieb DERUMIER, Alexandre:
>>> same here. As this is just internal we can also adapt it later
>>> though..
>>>
>>
>> Well, the Rust backend also uses 'maxcpu' and 'maxmem' currently :/
>> So at least in Usage/Static.pm, it will be more difficult to change
>> later.
>>
> I still have a virtio-mem pending patch series,
> where I introduce a real maxmemory option in config file.
> https://www.mail-archive.com/pve-devel@lists.proxmox.com/msg09395.html
> 
> 
> I think we should use same variable than in config files to avoid
> confusion.

not sure if config settings and internal run state/function params/return
values are that important to have identically named, or am I missing something?

> 
> maxmemory: the max memory we can hotplug when virtio-mem is used
> memory: the current online memory module
> balloon: the min size we can balloon.
> 
> 

yeah, I'd prefer either a format string with a default key:

memory: [current=]\d+[,max=\d+]

or:

memory: ...
memory-max: ...

Keeping the common prefix first groups those settings close together on
alphabetical sorting in API docs or CLI usage, making it easier to find
for users and having an actual separator is just nicer in general.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* [pve-devel] applied: [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options
  2022-11-10 14:37 ` [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options Fiona Ebner
@ 2022-11-17 11:52   ` Thomas Lamprecht
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Lamprecht @ 2022-11-17 11:52 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 10/11/2022 um 15:37 schrieb Fiona Ebner:
> Initially, with a setting for HA to switch between basic (just count
> services) and static (use static node and resource information).
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  data/PVE/DataCenterConfig.pm | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-11-17 11:52 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10 14:37 [pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/3] initial commit Fiona Ebner
2022-11-15 10:15   ` [pve-devel] applied: " Wolfgang Bumiller
2022-11-15 15:39   ` [pve-devel] " DERUMIER, Alexandre
2022-11-16  9:09     ` Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/3] add pve_static module Fiona Ebner
2022-11-16  9:18   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-resource-scheduling 3/3] add Debian packaging Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve-rs: add resource scheduling module Fiona Ebner
2022-11-15 10:16   ` [pve-devel] applied-series: " Wolfgang Bumiller
2022-11-10 14:37 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] add basic test for resource scheduling Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH manager 1/3] pvestatd: broadcast static node information Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH v3 manager 2/3] cluster resources: add cgroup-mode to node properties Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH v2 manager 3/3] ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH cluster 1/1] datacenter config: add cluster resource scheduling (crs) options Fiona Ebner
2022-11-17 11:52   ` [pve-devel] applied: " Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 01/11] env: add get_static_node_stats() method Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 02/11] resources: add get_static_stats() method Fiona Ebner
2022-11-15 13:28   ` Thomas Lamprecht
2022-11-16  8:46     ` Fiona Ebner
2022-11-16  8:59       ` Thomas Lamprecht
2022-11-16 12:38       ` DERUMIER, Alexandre
2022-11-16 12:52         ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 03/11] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 04/11] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-16  7:17   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 05/11] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 06/11] usage: add Usage::Static plugin Fiona Ebner
2022-11-15 15:55   ` DERUMIER, Alexandre
2022-11-16  9:10     ` Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 07/11] env: add get_crs_settings() method Fiona Ebner
2022-11-16  7:05   ` Thomas Lamprecht
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 08/11] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured Fiona Ebner
2022-11-11  9:28   ` Fiona Ebner
2022-11-16  7:14     ` Thomas Lamprecht
2022-11-16  9:37       ` Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 10/11] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-10 14:37 ` [pve-devel] [PATCH ha-manager 11/11] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-10 14:38 ` [pve-devel] [PATCH docs 1/1] ha: add section about scheduler modes Fiona Ebner
2022-11-15 13:12 ` [pve-devel] partially-applied: [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal