* [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node
@ 2023-03-21 12:33 Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/2] pve ha: fix scoring issue when a node is overcommitted compared to others Fiona Ebner
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Fiona Ebner @ 2023-03-21 12:33 UTC (permalink / raw)
To: pve-devel
The current algorithm is blind in cases where the static node stats
are the same and there is (at least) one node that is overcommitted
when compared to the others.
The problem is that using the linear average produces the same values
for the alternatives (except for the overcommitted node). Fix it by
using a nonlinear average.
The patch proxmox-perl-rs 2/2 requires a build-dependency bump for
proxmox-resource-scheduling.
proxmox-resource-scheduling:
Fiona Ebner (2):
pve ha: fix scoring issue when a node is overcommitted compared to
others
pve ha: fix typo in comment
src/pve_static.rs | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
proxmox-perl-rs:
Fiona Ebner (2):
pve: test: resource scheduling: use dedicated functions for tests
pve: test: resource scheduling: add test with overcommitted node
pve-rs/test/resource_scheduling.pl | 102 +++++++++++++++++++++--------
1 file changed, 73 insertions(+), 29 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH proxmox-resource-scheduling 1/2] pve ha: fix scoring issue when a node is overcommitted compared to others
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
@ 2023-03-21 12:33 ` Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/2] pve ha: fix typo in comment Fiona Ebner
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2023-03-21 12:33 UTC (permalink / raw)
To: pve-devel
When nodes have different stats, the sum of percentage values will be
different for different alternatives, so the linear average is enough.
But when nodes have the same stats, this is not the case, the sum will
be the same, thus the average won't influence the scoring. If there is
an already overcommitted node, all alternatives besides the already
overcommitted node would be scored the same.
To fix it, use the squares of percentages instead, where more evenly
distributed usage across nodes will lead to a smaller value and thus
better scoring.
It's not really necessary to divide by length or take the sqrt, but it
seemed nicer to have something that would give 1.0 if all inputs are
1.0.
Reported-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Sorry about the stupid mistake :(
src/pve_static.rs | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/src/pve_static.rs b/src/pve_static.rs
index 345c0a2..6663b70 100644
--- a/src/pve_static.rs
+++ b/src/pve_static.rs
@@ -79,11 +79,11 @@ pub fn score_nodes_to_start_service(
.iter()
.enumerate()
.map(|(target_index, _)| {
- // all of these are as percentages to be comparable across nodes
+ // Base values on percentages to allow comparing nodes with different stats.
let mut highest_cpu = 0.0;
- let mut sum_cpu = 0.0;
+ let mut squares_cpu = 0.0;
let mut highest_mem = 0.0;
- let mut sum_mem = 0.0;
+ let mut squares_mem = 0.0;
for (index, node) in nodes.iter().enumerate() {
let new_cpu = if index == target_index {
@@ -92,7 +92,7 @@ pub fn score_nodes_to_start_service(
node.cpu
} / (node.maxcpu as f64);
highest_cpu = f64::max(highest_cpu, new_cpu);
- sum_cpu += new_cpu;
+ squares_cpu += new_cpu.powi(2);
let new_mem = if index == target_index {
node.mem + service.maxmem
@@ -101,13 +101,13 @@ pub fn score_nodes_to_start_service(
} as f64
/ node.maxmem as f64;
highest_mem = f64::max(highest_mem, new_mem);
- sum_mem += new_mem;
+ squares_mem += new_mem.powi(2);
}
PveTopsisAlternative {
- average_cpu: sum_cpu / len as f64,
+ average_cpu: (squares_cpu / len as f64).sqrt(),
highest_cpu,
- average_memory: sum_mem / len as f64,
+ average_memory: (squares_mem / len as f64).sqrt(),
highest_memory: highest_mem,
}
.into()
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH proxmox-resource-scheduling 2/2] pve ha: fix typo in comment
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/2] pve ha: fix scoring issue when a node is overcommitted compared to others Fiona Ebner
@ 2023-03-21 12:33 ` Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve: test: resource scheduling: use dedicated functions for tests Fiona Ebner
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2023-03-21 12:33 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
src/pve_static.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/pve_static.rs b/src/pve_static.rs
index 6663b70..814d44f 100644
--- a/src/pve_static.rs
+++ b/src/pve_static.rs
@@ -9,11 +9,11 @@ use crate::topsis;
pub struct StaticNodeUsage {
/// Hostname of the node.
pub name: String,
- /// CPU utilization. Can be more than `maxcpu` if overcommited.
+ /// CPU utilization. Can be more than `maxcpu` if overcommitted.
pub cpu: f64,
/// Total number of CPUs.
pub maxcpu: usize,
- /// Used memory in bytes. Can be more than `maxmem` if overcommited.
+ /// Used memory in bytes. Can be more than `maxmem` if overcommitted.
pub mem: usize,
/// Total memory in bytes.
pub maxmem: usize,
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH proxmox-perl-rs 1/2] pve: test: resource scheduling: use dedicated functions for tests
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/2] pve ha: fix scoring issue when a node is overcommitted compared to others Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/2] pve ha: fix typo in comment Fiona Ebner
@ 2023-03-21 12:33 ` Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] pve: test: resource scheduling: add test with overcommitted node Fiona Ebner
2023-03-21 14:25 ` [pve-devel] applied-series: [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue " Thomas Lamprecht
4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2023-03-21 12:33 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-rs/test/resource_scheduling.pl | 83 +++++++++++++++++-------------
1 file changed, 47 insertions(+), 36 deletions(-)
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
index 4f5105f..9c48178 100755
--- a/pve-rs/test/resource_scheduling.pl
+++ b/pve-rs/test/resource_scheduling.pl
@@ -7,43 +7,54 @@ use Test::More;
use PVE::RS::ResourceScheduling::Static;
-my $static = PVE::RS::ResourceScheduling::Static->new();
-is(scalar($static->list_nodes()->@*), 0, 'node list empty');
-$static->add_node("A", 10, 100_000_000_000);
-is(scalar($static->list_nodes()->@*), 1, '1 node added');
-$static->add_node("B", 20, 200_000_000_000);
-is(scalar($static->list_nodes()->@*), 2, '2nd node');
-$static->add_node("C", 30, 300_000_000_000);
-is(scalar($static->list_nodes()->@*), 3, '3rd node');
-$static->remove_node("C");
-is(scalar($static->list_nodes()->@*), 2, '3rd removed should be 2');
-ok($static->contains_node("A"), 'should contain a node A');
-ok($static->contains_node("B"), 'should contain a node B');
-ok(!$static->contains_node("C"), 'should not contain a node C');
-
-my $service = {
- maxcpu => 4,
- maxmem => 20_000_000_000,
-};
-
-for (my $i = 0; $i < 15; $i++) {
- my $score_list = $static->score_nodes_to_start_service($service);
-
- # imitate HA manager
- my $scores = { map { $_->[0] => -$_->[1] } $score_list->@* };
- my @nodes = sort {
- $scores->{$a} <=> $scores->{$b} || $a cmp $b
- } keys $scores->%*;
-
- if ($i % 3 == 2) {
- is($nodes[0], "A", 'first should be A');
- is($nodes[1], "B", 'second should be A');
- } else {
- is($nodes[0], "B", 'first should be B');
- is($nodes[1], "A", 'second should be A');
- }
+sub test_basic {
+ my $static = PVE::RS::ResourceScheduling::Static->new();
+ is(scalar($static->list_nodes()->@*), 0, 'node list empty');
+ $static->add_node("A", 10, 100_000_000_000);
+ is(scalar($static->list_nodes()->@*), 1, '1 node added');
+ $static->add_node("B", 20, 200_000_000_000);
+ is(scalar($static->list_nodes()->@*), 2, '2nd node');
+ $static->add_node("C", 30, 300_000_000_000);
+ is(scalar($static->list_nodes()->@*), 3, '3rd node');
+ $static->remove_node("C");
+ is(scalar($static->list_nodes()->@*), 2, '3rd removed should be 2');
+ ok($static->contains_node("A"), 'should contain a node A');
+ ok($static->contains_node("B"), 'should contain a node B');
+ ok(!$static->contains_node("C"), 'should not contain a node C');
+}
- $static->add_service_usage_to_node($nodes[0], $service);
+sub test_balance {
+ my $static = PVE::RS::ResourceScheduling::Static->new();
+ $static->add_node("A", 10, 100_000_000_000);
+ $static->add_node("B", 20, 200_000_000_000);
+
+ my $service = {
+ maxcpu => 4,
+ maxmem => 20_000_000_000,
+ };
+
+ for (my $i = 0; $i < 15; $i++) {
+ my $score_list = $static->score_nodes_to_start_service($service);
+
+ # imitate HA manager
+ my $scores = { map { $_->[0] => -$_->[1] } $score_list->@* };
+ my @nodes = sort {
+ $scores->{$a} <=> $scores->{$b} || $a cmp $b
+ } keys $scores->%*;
+
+ if ($i % 3 == 2) {
+ is($nodes[0], "A", 'first should be A');
+ is($nodes[1], "B", 'second should be A');
+ } else {
+ is($nodes[0], "B", 'first should be B');
+ is($nodes[1], "A", 'second should be A');
+ }
+
+ $static->add_service_usage_to_node($nodes[0], $service);
+ }
}
+test_basic();
+test_balance();
+
done_testing();
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] [PATCH proxmox-perl-rs 2/2] pve: test: resource scheduling: add test with overcommitted node
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
` (2 preceding siblings ...)
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve: test: resource scheduling: use dedicated functions for tests Fiona Ebner
@ 2023-03-21 12:33 ` Fiona Ebner
2023-03-21 14:25 ` [pve-devel] applied-series: [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue " Thomas Lamprecht
4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2023-03-21 12:33 UTC (permalink / raw)
To: pve-devel
which will fail with librust-proxmox-resource-scheduling-dev=0.1.0-1
Reported-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Build-depends on fixed proxmox-resource-scheduling
pve-rs/test/resource_scheduling.pl | 33 ++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/pve-rs/test/resource_scheduling.pl b/pve-rs/test/resource_scheduling.pl
index 9c48178..fbedfe1 100755
--- a/pve-rs/test/resource_scheduling.pl
+++ b/pve-rs/test/resource_scheduling.pl
@@ -54,7 +54,40 @@ sub test_balance {
}
}
+sub test_overcommitted {
+ my $static = PVE::RS::ResourceScheduling::Static->new();
+ $static->add_node("A", 4, 4_102_062_080);
+ $static->add_node("B", 4, 4_102_062_080);
+ $static->add_node("C", 4, 4_102_053_888);
+ $static->add_node("D", 4, 4_102_053_888);
+
+ my $service = {
+ maxcpu => 1,
+ maxmem => 536_870_912,
+ };
+
+ $static->add_service_usage_to_node("A", $service);
+ $static->add_service_usage_to_node("A", $service);
+ $static->add_service_usage_to_node("A", $service);
+ $static->add_service_usage_to_node("B", $service);
+ $static->add_service_usage_to_node("A", $service);
+
+ my $score_list = $static->score_nodes_to_start_service($service);
+
+ # imitate HA manager
+ my $scores = { map { $_->[0] => -$_->[1] } $score_list->@* };
+ my @nodes = sort {
+ $scores->{$a} <=> $scores->{$b} || $a cmp $b
+ } keys $scores->%*;
+
+ is($nodes[0], "C", 'first should be C');
+ is($nodes[1], "D", 'second should be D');
+ is($nodes[2], "B", 'third should be B');
+ is($nodes[3], "A", 'fourth should be A');
+}
+
test_basic();
test_balance();
+test_overcommitted();
done_testing();
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pve-devel] applied-series: [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
` (3 preceding siblings ...)
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] pve: test: resource scheduling: add test with overcommitted node Fiona Ebner
@ 2023-03-21 14:25 ` Thomas Lamprecht
4 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2023-03-21 14:25 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
Am 21/03/2023 um 13:33 schrieb Fiona Ebner:
> The current algorithm is blind in cases where the static node stats
> are the same and there is (at least) one node that is overcommitted
> when compared to the others.
>
> The problem is that using the linear average produces the same values
> for the alternatives (except for the overcommitted node). Fix it by
> using a nonlinear average.
>
> The patch proxmox-perl-rs 2/2 requires a build-dependency bump for
> proxmox-resource-scheduling.
>
>
> proxmox-resource-scheduling:
>
> Fiona Ebner (2):
> pve ha: fix scoring issue when a node is overcommitted compared to
> others
> pve ha: fix typo in comment
>
> src/pve_static.rs | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
>
> proxmox-perl-rs:
>
> Fiona Ebner (2):
> pve: test: resource scheduling: use dedicated functions for tests
> pve: test: resource scheduling: add test with overcommitted node
>
> pve-rs/test/resource_scheduling.pl | 102 +++++++++++++++++++++--------
> 1 file changed, 73 insertions(+), 29 deletions(-)
>
applied series, thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-03-21 14:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21 12:33 [pve-devel] [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue with overcommitted node Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 1/2] pve ha: fix scoring issue when a node is overcommitted compared to others Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-resource-scheduling 2/2] pve ha: fix typo in comment Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 1/2] pve: test: resource scheduling: use dedicated functions for tests Fiona Ebner
2023-03-21 12:33 ` [pve-devel] [PATCH proxmox-perl-rs 2/2] pve: test: resource scheduling: add test with overcommitted node Fiona Ebner
2023-03-21 14:25 ` [pve-devel] applied-series: [PATCH-SERIES proxmox-{resource-scheduling, perl-rs}] fix scoring issue " Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox