* [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager
@ 2021-12-21 15:13 Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0 Alexandre Derumier
0 siblings, 2 replies; 3+ messages in thread
From: Alexandre Derumier @ 2021-12-21 15:13 UTC (permalink / raw)
To: pve-devel
Hi,
this is a proof of concept to implement ressource aware HA.
The current implementation is really basic,
simply balancing the number of services on each node.
I had some real production cases, where a node is failing, and restarted vm
impact others nodes because of too much cpu/ram usage.
Changelog v2:
- merging main code && Sim code in same patch for now. (I'll split them later)
- cleanup will all Thomas comments review (thanks again)
- add more comments in code
- check storage for lxc too
- use maxmem for windows vms
Changelog v3:
- fix vm/ct config read (need to specify node)
- fix storage_availability_check params
- Classify nodes with low/medium/high threshold for better balancing.
We try to fill nodes with lower usage first until the threshold is reached
I still need to add missing storage availability test
Alexandre Derumier (2):
add ressource awareness manager
add test-basic0
src/PVE/HA/Env.pm | 33 ++++
src/PVE/HA/Env/PVE2.pm | 177 +++++++++++++++++
src/PVE/HA/Manager.pm | 274 ++++++++++++++++++++++++++-
src/PVE/HA/Sim/Hardware.pm | 61 ++++++
src/PVE/HA/Sim/TestEnv.pm | 50 ++++-
src/test/test-basic0/README | 1 +
src/test/test-basic0/cmdlist | 4 +
src/test/test-basic0/hardware_status | 5 +
src/test/test-basic0/log.expect | 52 +++++
src/test/test-basic0/manager_status | 1 +
src/test/test-basic0/node_stats | 5 +
src/test/test-basic0/service_config | 5 +
src/test/test-basic0/service_stats | 5 +
13 files changed, 664 insertions(+), 9 deletions(-)
create mode 100644 src/test/test-basic0/README
create mode 100644 src/test/test-basic0/cmdlist
create mode 100644 src/test/test-basic0/hardware_status
create mode 100644 src/test/test-basic0/log.expect
create mode 100644 src/test/test-basic0/manager_status
create mode 100644 src/test/test-basic0/node_stats
create mode 100644 src/test/test-basic0/service_config
create mode 100644 src/test/test-basic0/service_stats
--
2.30.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager
2021-12-21 15:13 [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager Alexandre Derumier
@ 2021-12-21 15:13 ` Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0 Alexandre Derumier
1 sibling, 0 replies; 3+ messages in thread
From: Alexandre Derumier @ 2021-12-21 15:13 UTC (permalink / raw)
To: pve-devel
his new implementation use best-fit heuristic vector packing with constraints support.
- We compute nodes memory/cpu, and vm memory/cpu average stats on last 20min
For each ressource :
- First, we ordering pending recovery state services by memory, then cpu usage.
Memory is more important here, because vm can't start if target node don't have enough memory
- Then, we check possible target nodes contraints. (storage available, node have enough cpu/ram, node have enough cores,...)
(could be extended with other constraint like vm affinity/anti-affinity, cpu compatibilty, ...)
- We classify nodes with low/medium/high cpu/mem thresholds
- Then we compute a node weight with euclidean distance of both cpu/ram vectors between vm usage and node available ressources.
- we ordering nodelist by - group prio,
- soft constraint prio (antifinity),
- threshold prio (try to recover to low threshold group nodes first, the medium, then high),
- distance weight: if vm use 1go ram/1% cpu, node1 have 2go ram/2% cpu , and node2 have 4go ram/4% cpu, node1 will be choose because it's the nearest of vm usage)
- number of HA vm/CT
and choose the first node of the list.
- We add recovered vm cpu/ram to target node stats. (This is only an best effort estimation, as the vm start is async on target lrm, and could failed,...)
I have keeped HA group node prio, and other other ordering,
so this don't break current tests, and we can add easily a option at datacenter to enable/disable
It could be easy to implement later some kind of vm auto migration when a node use too much cpu/ram,
reusing same node selection algorithm
I have added a basic test, I'll add more tests later if this patch serie is ok for you.
Some good litterature about heuristics:
microsoft hyper-v implementation:
- http://kunaltalwar.org/papers/VBPacking.pdf
- https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/virtualization.pdf
Variable size vector bin packing heuristics:
- https://hal.archives-ouvertes.fr/hal-00868016v2/document
Algorithm comparaison (first-fit, worst-fit, best-fit,..):
http://www.diva-portal.org/smash/get/diva2:1261137/FULLTEXT02.pdf
---
src/PVE/HA/Env.pm | 33 +++++
src/PVE/HA/Env/PVE2.pm | 177 ++++++++++++++++++++++++
src/PVE/HA/Manager.pm | 274 +++++++++++++++++++++++++++++++++++--
src/PVE/HA/Sim/Hardware.pm | 61 +++++++++
src/PVE/HA/Sim/TestEnv.pm | 50 ++++++-
5 files changed, 586 insertions(+), 9 deletions(-)
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index ac569a9..d957aa9 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -269,4 +269,37 @@ sub get_ha_settings {
return $self->{plug}->get_ha_settings();
}
+sub get_node_rrd_stats {
+ my ($self, $node) = @_;
+
+ return $self->{plug}->get_node_rrd_stats($node);
+}
+
+sub get_vm_rrd_stats {
+ my ($self, $vmid, $percentile) = @_;
+
+ return $self->{plug}->get_vm_rrd_stats($vmid, $percentile);
+}
+
+sub read_vm_ct_config {
+ my ($self, $vmid, $type) = @_;
+
+ if ($type eq 'vm') {
+ return $self->{plug}->read_vm_config($vmid);
+ } elsif ($type eq 'ct') {
+ return $self->{plug}->read_ct_config($vmid);
+ }
+}
+
+sub read_storecfg {
+ my ($self) = @_;
+
+ return $self->{plug}->read_storecfg();
+}
+
+sub check_storage_availability {
+ my ($self, $vmconf, $type, $node, $storecfg) = @_;
+
+ return $self->{plug}->check_storage_availability($vmconf, $type, $node, $storecfg);
+}
1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 5e0a683..1b1b4b0 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -12,6 +12,12 @@ use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file
use PVE::DataCenterConfig;
use PVE::INotify;
use PVE::RPCEnvironment;
+use PVE::API2Tools;
+use PVE::QemuConfig;
+use PVE::QemuServer;
+use PVE::LXC::Config;
+use PVE::Storage;
+use RRDs;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::Env;
@@ -459,4 +465,175 @@ sub get_ha_settings {
return $datacenterconfig->{ha};
}
+sub get_node_rrd_stats {
+ my ($self, $node) = @_;
+
+ my $rrd = PVE::Cluster::rrd_dump();
+ my $members = PVE::Cluster::get_members();
+
+ my $stats = PVE::API2Tools::extract_node_stats($node, $members, $rrd);
+
+ return $stats;
+}
+
+sub get_vm_rrd_stats {
+ my ($self, $vmid, $percentile) = @_;
+
+ my $rrdname = "pve2-vm/$vmid";
+ my $rrddir = "/var/lib/rrdcached/db";
+
+ my $rrd = "$rrddir/$rrdname";
+
+ my $cf = "AVERAGE";
+
+ my $reso = 60;
+ my $ctime = $reso*int(time()/$reso);
+
+ #last 20minutes
+ my $req_start = $ctime - $reso*20;
+ my $req_end = $ctime - $reso*1;
+
+ my @args = (
+ "-s" => $req_start,
+ "-e" => $req_end,
+ "-r" => $reso,
+ );
+
+ my $socket = "/var/run/rrdcached.sock";
+ push @args, "--daemon" => "unix:$socket" if -S $socket;
+
+ my ($start, $step, $names, $data) = RRDs::fetch($rrd, $cf, @args);
+
+ my @cpu = ();
+ my @mem = ();
+ my @maxmem = ();
+ my @maxcpu = ();
+
+ foreach my $rec (@$data) {
+ my $maxcpu = @$rec[0] || 0;
+ my $cpu = @$rec[1] || 0;
+ my $maxmem = @$rec[2] || 0;
+ my $mem = @$rec[3] || 0;
+ #skip zeros values if vm is down
+ push @cpu, $cpu*$maxcpu if $cpu > 0;
+ push @mem, $mem if $mem > 0;
+ push @maxcpu, $maxcpu if $maxcpu > 0;
+ push @maxmem, $maxmem if $maxmem > 0;
+ }
+
+ my $stats = {};
+
+ $stats->{cpu} = percentile($percentile, \@cpu) || 0;
+ $stats->{mem} = percentile($percentile, \@mem) || 0;
+ $stats->{maxmem} = percentile($percentile, \@maxmem) || 0;
+ $stats->{maxcpu} = percentile($percentile, \@maxcpu) || 0;
+ $stats->{totalcpu} = $stats->{cpu} * $stats->{maxcpu} * 100;
+
+ return $stats;
+}
+
+sub percentile {
+ my ($p, $aref) = @_;
+ my $percentile = int($p * $#{$aref}/100);
+ return (sort @$aref)[$percentile];
+}
+
+sub read_vm_config {
+ my ($self, $vmid) = @_;
+
+ my $conf = undef;
+ my $finalconf = {};
+
+ my $vmlist = PVE::Cluster::get_vmlist();
+ my $node = $vmlist->{ids}->{$vmid}->{node};
+
+ eval { $conf = PVE::QemuConfig->load_config($vmid, $node)};
+ return if !$conf;
+
+ if ( PVE::QemuServer::windows_version($conf->{ostype}) ) {
+ $finalconf->{ostype} = 'windows';
+ } else {
+ $finalconf->{ostype} = $conf->{ostype};
+ }
+
+ PVE::QemuConfig->foreach_volume($conf, sub {
+ my ($ds, $drive) = @_;
+
+ $finalconf->{$ds} = $conf->{$ds};
+ });
+
+ return $finalconf;
+}
+
+sub read_ct_config {
+ my ($self, $vmid) = @_;
+
+ my $conf = undef;
+ my $finalconf = {};
+
+ my $vmlist = PVE::Cluster::get_vmlist();
+ my $node = $vmlist->{ids}->{$vmid}->{node};
+
+ eval { $conf = PVE::LXC::Config->load_config($vmid, $node)};
+ return if !$conf;
+
+ PVE::LXC::Config->foreach_volume($conf, sub {
+ my ($ms, $mountpoint) = @_;
+ $finalconf->{$ms} = $conf->{$ms};
+ });
+
+ return $finalconf;
+}
+
+sub read_storecfg {
+ my ($self) = @_;
+
+ return PVE::Storage::config();
+}
+
+sub check_storage_availability {
+ my ($self, $vmconf, $type, $node, $storecfg) = @_;
+
+ if ($type eq 'vm') {
+ eval { PVE::QemuServer::check_storage_availability($storecfg, $vmconf, $node) };
+ return if $@;
+ } elsif ($type eq 'ct') {
+ eval { check_lxc_storage_availability($storecfg, $vmconf, $node) };
+ return if $@;
+ }
+ return 1;
+}
+
+
+
+##copy/paste from PVE::LXC::Migrate. add ad PVE::LXC::check_storage_availability like qemuserver
+sub check_lxc_storage_availability {
+ my ($storecfg, $conf, $node) = @_;
+
+ PVE::LXC::Config->foreach_volume_full($conf, { include_unused => 1 }, sub {
+ my ($ms, $mountpoint) = @_;
+
+ my $volid = $mountpoint->{volume};
+ my $type = $mountpoint->{type};
+
+ # skip dev/bind mps when shared
+ if ($type ne 'volume') {
+ if ($mountpoint->{shared}) {
+ return;
+ } else {
+ die "cannot migrate local $type mount point '$ms'\n";
+ }
+ }
+
+ my ($storage, $volname) = PVE::Storage::parse_volume_id($volid, 1) if $volid;
+ die "can't determine assigned storage for mount point '$ms'\n" if !$storage;
+
+ # check if storage is available on both nodes
+ my $scfg = PVE::Storage::storage_check_enabled($storecfg, $storage);
+ PVE::Storage::storage_check_enabled($storecfg, $storage, $node);
+
+ die "content type 'rootdir' is not available on storage '$storage'\n"
+ if !$scfg->{content}->{rootdir};
+ });
+}
1;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 1c66b43..80624f4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -1,3 +1,4 @@
+
package PVE::HA::Manager;
use strict;
@@ -394,8 +395,20 @@ sub manage {
my $repeat = 0;
$self->recompute_online_node_usage();
-
- foreach my $sid (sort keys %$ss) {
+ $self->recompute_online_node_stats();
+
+ $self->get_service_stats($ss);
+ $self->{storecfg} = $haenv->read_storecfg();
+
+ foreach my $sid (
+ #ordering vm by size, bigger mem first then bigger cpu
+ #could be improved with bubblesearch heuristic
+ #https://www.cs.tufts.edu/~nr/cs257/archive/michael-mitzenmacher/bubblesearch.pdf
+ sort {
+ $ss->{$a}->{stats}->{memg} <=> $ss->{$b}->{stats}->{memg} ||
+ $ss->{$a}->{stats}->{totalcpuround} <=> $ss->{$b}->{stats}->{totalcpuround} ||
+ $ss->{$a}->{type} cmp $ss->{$b}->{type}}
+ keys %$ss) {
my $sd = $ss->{$sid};
my $cd = $sc->{$sid} || { state => 'disabled' };
@@ -802,12 +815,7 @@ sub next_state_recovery {
$self->recompute_online_node_usage(); # we want the most current node state
- my $recovery_node = select_service_node(
- $self->{groups},
- $self->{online_node_usage},
- $cd,
- $sd->{node},
- );
+ my $recovery_node = $self->find_bestfit_node_target($cd , $sd);
if ($recovery_node) {
my $msg = "recover service '$sid' from fenced node '$fenced_node' to node '$recovery_node'";
@@ -822,6 +830,14 @@ sub next_state_recovery {
$haenv->steal_service($sid, $sd->{node}, $recovery_node);
$self->{online_node_usage}->{$recovery_node}++;
+ #add vm cpu/mem to current node stats (this is an estimation based on last 20min vm stats)
+ my $node_stats = $self->{online_node_stats}->{$recovery_node}->{stats};
+ $node_stats->{totalcpu} += $sd->{stats}->{totalcpu};
+ $node_stats->{mem} += $sd->{stats}->{mem};
+ $node_stats->{totalfreecpu} = (100 * $node_stats->{maxcpu}) - $node_stats->{totalcpu};
+ $node_stats->{freemem} = $node_stats->{maxmem} - $node_stats->{mem};
+
+
# NOTE: $sd *is normally read-only*, fencing is the exception
$cd->{node} = $sd->{node} = $recovery_node;
my $new_state = ($cd->{state} eq 'started') ? 'started' : 'request_stop';
@@ -839,4 +855,246 @@ sub next_state_recovery {
}
}
+sub find_bestfit_node_target {
+ my($self, $cd, $sd) = @_;
+
+ my $online_nodes = $self->{online_node_stats};
+ my $groups = $self->{groups};
+ my $hagroup = get_service_group($groups, $online_nodes, $cd);
+ my ($pri_groups, $group_members_prio) = get_node_priority_groups($hagroup, $online_nodes);
+
+ my $target_nodes = {};
+ foreach my $node (keys %$online_nodes) {
+
+ #### FILTERING NODES WITH HARD CONSTRAINTS (vm can't be started)
+ next if !$self->check_hard_constraints($cd, $sd, $node, $group_members_prio);
+
+ #### compute differents prio
+ $target_nodes->{$node} = $self->add_node_prio($sd, $node, 'distance', $group_members_prio);
+ }
+
+ #order by soft_constraint_prio, hagroup prio, weight (Best fit algorithm, lower distance first), number of services, and nodename
+ my @target_array = sort {
+ $target_nodes->{$b}->{prio} <=> $target_nodes->{$a}->{prio} ||
+ $target_nodes->{$a}->{soft_constraint_prio} <=> $target_nodes->{$b}->{soft_constraint_prio} ||
+ $target_nodes->{$b}->{threshold_prio} <=> $target_nodes->{$a}->{threshold_prio} ||
+ $target_nodes->{$a}->{weight} <=> $target_nodes->{$b}->{weight} ||
+ $target_nodes->{$a}->{online_node_usage} <=> $target_nodes->{$b}->{online_node_usage} ||
+ $target_nodes->{$a}->{name} cmp $target_nodes->{$b}->{name}
+ } keys %$target_nodes;
+
+ my $target = $target_array[0];
+
+ return $target;
+}
+
+
+sub check_hard_constraints {
+ my ($self, $cd, $sd, $node, $group_members_prio) = @_;
+
+ my $haenv = $self->{haenv};
+ my $vm_stats = $sd->{stats};
+ my $node_stats = $self->{online_node_stats}->{$node}->{stats};
+
+ #node need to have a prio(restricted group)
+ return if !defined($group_members_prio->{$node});
+
+ #vm can't start if host have less core
+ return if $node_stats->{maxcpu} < $vm_stats->{maxcpu};
+ #vm can't start if node don't have enough mem to handle vm max mem
+ return if $node_stats->{freemem} < $vm_stats->{maxmem};
+
+
+ #check if target node have enough mem ressources under threshold
+ my $mem_critical_threshold = 95;
+ my $target_mem_percent = $node_stats->{maxmem} > 0 ? (($node_stats->{mem} + $vm_stats->{mem}) * 100 / $node_stats->{maxmem}) : 0;
+ return if $target_mem_percent > $mem_critical_threshold;
+
+ #check if target node have enough cpu ressources under threshold
+ my $cpu_critical_threshold = 95;
+ my $target_cpu_percent = $node_stats->{totalcpu} + $vm_stats->{totalcpu};
+ return if $target_cpu_percent > $cpu_critical_threshold;
+
+ return if !$haenv->check_storage_availability($sd->{vmconf}, $sd->{type}, $node, $self->{storecfg});
+
+ # fixme: check bridge availability
+ # fixme: vm: add a check for cpumodel compatibility ?
+ return 1;
+}
+
+sub compute_soft_constraints {
+
+ my $prio = 0;
+ #fixme : add antiaffinity
+
+ return $prio;
+}
+
+sub compute_node_threshold_prio {
+ my ($node_stats, $vm_stats) = @_;
+
+
+ #ordering nodes by low,medium,high usage.
+ #theses thresholds could be configured by user
+
+ #for high threshold to reach 80% max cpu/ram
+ #with ksm, node memory try to be max 80%
+ #cpu at 80% should be a good indicator, checking load and pressure could be implemented too
+
+ my @thresholds = ('low','medium','high');
+ my $mem_thresholds = { low => 20 , medium => 50, high => 80 };
+ my $cpu_thresholds = { low => 20 , medium => 50, high => 80 };
+
+ my $target_mem_percent = $node_stats->{maxmem} > 0 ? (($node_stats->{mem} + $vm_stats->{mem}) * 100 / $node_stats->{maxmem}) : 0;
+ my $target_cpu_percent = $node_stats->{totalcpu} + $vm_stats->{totalcpu};
+
+ my $prio = 0;
+ foreach my $key (@thresholds) {
+ $prio++ if $target_mem_percent <= $mem_thresholds->{$key} && $target_cpu_percent <= $cpu_thresholds->{$key};
+ }
+
+ return $prio;
+}
+
+sub add_node_prio {
+ my ($self, $sd, $nodename, $method, $group_members_prio) = @_;
+
+ my $vm_stats = $sd->{stats};
+ my $node_stats = $self->{online_node_stats}->{$nodename}->{stats};
+
+ #rounded values for vectors
+ my $vm_totalcpu = $vm_stats->{totalcpuround};
+ my $vm_mem = $vm_stats->{memg};
+ my $node_freecpu = roundup($node_stats->{totalfreecpu}, 10);
+ my $node_freemem = roundup(($node_stats->{freemem}/1024/1024/1024),1);
+
+ my @vec_vm = ($vm_totalcpu, $vm_mem); #? add network usage dimension ?
+ my @vec_node = ($node_freecpu, $node_freemem); #? add network usage dimension ?
+ my $weight = 0;
+ if ($method eq 'distance') {
+ $weight = euclidean_distance(\@vec_vm,\@vec_node);
+ } elsif ($method eq 'dotprod') {
+ $weight = dotprod(\@vec_vm,\@vec_node);
+ }
+
+ my $node = {};
+ $node->{weight} = $weight;
+ $node->{soft_constraint_prio} = compute_soft_constraints();
+ $node->{threshold_prio} = compute_node_threshold_prio($node_stats, $vm_stats);
+ $node->{prio} = $group_members_prio->{$nodename};
+ $node->{online_node_usage} = $self->{online_node_usage}->{$nodename};
+ $node->{name} = $nodename;
+
+ return $node;
+}
+
+sub get_service_stats {
+ my ($self, $ss) = @_;
+
+ my $haenv = $self->{haenv};
+
+ foreach my $sid (sort keys %$ss) {
+
+ my (undef, $type, $vmid) = $haenv->parse_sid($sid);
+ $ss->{$sid}->{type} = $type;
+ $ss->{$sid}->{vmid} = $vmid;
+
+ my $stats = { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0, totalcpu => 0, totalcpuround => 0, memg => 0 };
+ $ss->{$sid}->{stats} = $stats;
+
+ #avoid to compute all stats, as currently we only support recovery
+ next if $ss->{$sid}->{state} ne 'recovery';
+
+ my $vmconf = $haenv->read_vm_ct_config($vmid, $type);
+ $ss->{$sid}->{vmconf} = $vmconf;
+
+ #get vm/ct stats on last 20min (95percentile)
+ $stats = $haenv->get_vm_rrd_stats($vmid, 95);
+
+ #windows vm fill memory with zero at boot, so mem = maxmem
+ $stats->{mem} = $stats->{maxmem} if $vmconf && defined($vmconf->{ostype}) && $vmconf->{ostype} eq 'windows';
+
+ #totalcpu = relative cpu for 1core. 50% of 4 cores = 200% of 1 core
+ $stats->{totalcpu} = $stats->{cpu} * 100 * $stats->{maxcpu};
+
+ #rounded by 10% step for ordering
+ $stats->{totalcpuround} = roundup($stats->{totalcpu}, 10);
+ #rounded by 1G step for ordering
+ $stats->{memg} = roundup(($stats->{mem} /1024 /1024 /1024), 1);
+
+ $ss->{$sid}->{stats} = $stats;
+ }
+}
+
+sub recompute_online_node_stats {
+ my ($self) = @_;
+
+ my $online_node_stats = {};
+ my $online_nodes = $self->{ns}->list_online_nodes();
+
+ foreach my $node (@$online_nodes) {
+ my $stats = $self->{haenv}->get_node_rrd_stats($node);
+ $stats->{cpu} = 0 if !defined($stats->{cpu});
+ $stats->{maxcpu} = 0 if !defined($stats->{maxcpu});
+ $stats->{mem} = 0 if !defined($stats->{mem});
+ $stats->{maxmem} = 0 if !defined($stats->{maxmem});
+ $stats->{totalcpu} = $stats->{cpu} * 100 * $stats->{maxcpu}; #how to handle different cpu model power ? bogomips ?
+ $stats->{totalfreecpu} = (100 * $stats->{maxcpu}) - $stats->{totalcpu};
+ $stats->{freemem} = $stats->{maxmem} - $stats->{mem};
+ $online_node_stats->{$node}->{stats} = $stats;
+ }
+
+ $self->{online_node_stats} = $online_node_stats;
+}
+
+## math helpers
+sub dotprod {
+ my($vec_a, $vec_b, $mode) = @_;
+ die "they must have the same size\n" unless @$vec_a == @$vec_b;
+ $mode = "" if !$mode;
+ my $sum = 0;
+ my $norm_a = 0;
+ my $norm_b = 0;
+
+ for(my $i=0; $i < scalar @{$vec_a}; $i++) {
+ my $a = @{$vec_a}[$i];
+ my $b = @{$vec_b}[$i];
+
+ $sum += $a * $b;
+ $norm_a += $a * $a;
+ $norm_b += $b * $b;
+ }
+
+ if($mode eq 'normR') {
+ return $sum / (sqrt($norm_a) * sqrt($norm_b))
+ } elsif ($mode eq 'normC') {
+ return $sum / $norm_b;
+ }
+ return $sum;
+}
+
+sub euclidean_distance {
+ my($vec_a, $vec_b) = @_;
+
+ my $sum = 0;
+
+ for(my $i=0; $i < scalar @{$vec_a}; $i++) {
+ my $a = @{$vec_a}[$i];
+ my $b = @{$vec_b}[$i];
+ $sum += ($b - $a)**2;
+ }
+
+ return sqrt($sum);
+}
+
+sub roundup {
+ my ($number, $round) = @_;
+
+ if ($number % $round) {
+ return (1 + int($number/$round)) * $round;
+ } else {
+ return $number;
+ }
+}
+
1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index ba731e5..396e8bd 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -109,6 +109,22 @@ sub read_service_config {
return $conf;
}
+sub read_service_stats {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/service_stats";
+ my $conf = PVE::HA::Tools::read_json_from_file($filename);
+ return $conf;
+}
+
+sub read_node_stats {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/node_stats";
+ my $conf = PVE::HA::Tools::read_json_from_file($filename);
+ return $conf;
+}
+
sub update_service_config {
my ($self, $sid, $param) = @_;
@@ -132,6 +148,24 @@ sub write_service_config {
return PVE::HA::Tools::write_json_to_file($filename, $conf);
}
+sub write_service_stats {
+ my ($self, $conf) = @_;
+
+ $self->{service_stats} = $conf;
+
+ my $filename = "$self->{statusdir}/service_stats";
+ return PVE::HA::Tools::write_json_to_file($filename, $conf);
+}
+
+sub write_node_stats {
+ my ($self, $conf) = @_;
+
+ $self->{node_stats} = $conf;
+
+ my $filename = "$self->{statusdir}/node_stats";
+ return PVE::HA::Tools::write_json_to_file($filename, $conf);
+}
+
sub read_fence_config {
my ($self) = @_;
@@ -382,6 +416,31 @@ sub new {
$self->write_service_config($conf);
}
+ if (-f "$testdir/service_stats") {
+ copy("$testdir/service_stats", "$statusdir/service_stats");
+ } else {
+ my $conf = {
+ '101' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ '102' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ '103' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ '104' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ '105' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ '106' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ };
+ $self->write_service_stats($conf);
+ }
+
+ if (-f "$testdir/node_stats") {
+ copy("$testdir/node_stats", "$statusdir/node_stats");
+ } else {
+ my $conf = {
+ 'node1' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ 'node2' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ 'node3' => { cpu => 0, maxcpu => 0, mem => 0, maxmem => 0 },
+ };
+ $self->write_node_stats($conf);
+ }
+
if (-f "$testdir/hardware_status") {
copy("$testdir/hardware_status", "$statusdir/hardware_status") ||
die "Copy failed: $!\n";
@@ -415,6 +474,8 @@ sub new {
}
$self->{service_config} = $self->read_service_config();
+ $self->{service_stats} = $self->read_service_stats();
+ $self->{node_stats} = $self->read_node_stats();
return $self;
}
diff --git a/src/PVE/HA/Sim/TestEnv.pm b/src/PVE/HA/Sim/TestEnv.pm
index 6718d8c..308478d 100644
--- a/src/PVE/HA/Sim/TestEnv.pm
+++ b/src/PVE/HA/Sim/TestEnv.pm
@@ -118,4 +118,52 @@ sub get_max_workers {
return 0;
}
-1;
+sub get_node_rrd_stats {
+ my ($self, $node) = @_;
+
+ my $nodestats = $self->{hardware}->{node_stats};
+ my $stats = $nodestats->{$node};
+
+ return $stats;
+}
+
+sub get_vm_rrd_stats {
+ my ($self, $vmid, $percentile) = @_;
+
+ my $vmstats = $self->{hardware}->{service_stats};
+ my $stats = $vmstats->{$vmid};
+
+ $stats->{cpu} = $stats->{cpu} || 0;
+ $stats->{mem} = $stats->{mem} || 0;
+ $stats->{maxmem} = $stats->{maxmem} || 0;
+ $stats->{maxcpu} = $stats->{maxcpu} || 0;
+ $stats->{totalcpu} = $stats->{cpu} * $stats->{maxcpu} * 100;
+
+ return $stats;
+}
+
+sub read_vm_config {
+ my ($self, $vmid) = @_;
+
+ return {};
+}
+
+sub read_ct_config {
+ my ($self, $vmid) = @_;
+
+ return {};
+}
+
+sub read_storecfg {
+ my ($self) = @_;
+
+ return {};
+}
+
+sub check_storage_availability {
+ my ($self, $vmid, $type, $node, $storecfg) = @_;
+
+ return 1;
+}
+
+1;
\ No newline at end of file
--
2.30.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0
2021-12-21 15:13 [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager Alexandre Derumier
@ 2021-12-21 15:13 ` Alexandre Derumier
1 sibling, 0 replies; 3+ messages in thread
From: Alexandre Derumier @ 2021-12-21 15:13 UTC (permalink / raw)
To: pve-devel
---
src/test/test-basic0/README | 1 +
src/test/test-basic0/cmdlist | 4 +++
src/test/test-basic0/hardware_status | 5 +++
src/test/test-basic0/log.expect | 52 ++++++++++++++++++++++++++++
src/test/test-basic0/manager_status | 1 +
src/test/test-basic0/node_stats | 5 +++
src/test/test-basic0/service_config | 5 +++
src/test/test-basic0/service_stats | 5 +++
8 files changed, 78 insertions(+)
create mode 100644 src/test/test-basic0/README
create mode 100644 src/test/test-basic0/cmdlist
create mode 100644 src/test/test-basic0/hardware_status
create mode 100644 src/test/test-basic0/log.expect
create mode 100644 src/test/test-basic0/manager_status
create mode 100644 src/test/test-basic0/node_stats
create mode 100644 src/test/test-basic0/service_config
create mode 100644 src/test/test-basic0/service_stats
diff --git a/src/test/test-basic0/README b/src/test/test-basic0/README
new file mode 100644
index 0000000..223c9dc
--- /dev/null
+++ b/src/test/test-basic0/README
@@ -0,0 +1 @@
+Test failover after single node network failure.
\ No newline at end of file
diff --git a/src/test/test-basic0/cmdlist b/src/test/test-basic0/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-basic0/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node3 off" ]
+]
diff --git a/src/test/test-basic0/hardware_status b/src/test/test-basic0/hardware_status
new file mode 100644
index 0000000..119b81c
--- /dev/null
+++ b/src/test/test-basic0/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off" },
+ "node2": { "power": "off", "network": "off" },
+ "node3": { "power": "off", "network": "off" }
+}
\ No newline at end of file
diff --git a/src/test/test-basic0/log.expect b/src/test/test-basic0/log.expect
new file mode 100644
index 0000000..c61850d
--- /dev/null
+++ b/src/test/test-basic0/log.expect
@@ -0,0 +1,52 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node2'
+info 20 node1/crm: adding new service 'vm:103' on node 'node3'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service vm:103
+info 25 node3/lrm: service status vm:103 started
+info 40 node1/crm: service 'vm:102': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute network node3 off
+info 120 node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info 124 node3/crm: status change slave => wait_for_quorum
+info 125 node3/lrm: status change active => lost_agent_lock
+info 160 node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info 160 node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai 160 node1/crm: FENCE: Try to fence node 'node3'
+info 166 watchdog: execute power node3 off
+info 165 node3/crm: killed by poweroff
+info 166 node3/lrm: killed by poweroff
+info 166 hardware: server 'node3' stopped by poweroff (watchdog)
+info 240 node1/crm: got lock 'ha_agent_node3_lock'
+info 240 node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai 240 node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info 240 node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info 240 node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info 240 node1/crm: service 'vm:103': state changed from 'recovery' to 'started' (node = node2)
+info 243 node2/lrm: starting service vm:103
+info 243 node2/lrm: service status vm:103 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-basic0/manager_status b/src/test/test-basic0/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-basic0/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-basic0/node_stats b/src/test/test-basic0/node_stats
new file mode 100644
index 0000000..a5e654d
--- /dev/null
+++ b/src/test/test-basic0/node_stats
@@ -0,0 +1,5 @@
+{
+ "node1": { "cpu": 0.1, "maxcpu": 1,"mem": 20737418239,"maxmem": 107374182400 },
+ "node2": { "cpu": 0.1, "maxcpu": 1,"mem": 22737418240,"maxmem": 107374182400 },
+ "node3": { "cpu": 0.2, "maxcpu": 1,"mem": 10737418240,"maxmem": 107374182400 }
+}
\ No newline at end of file
diff --git a/src/test/test-basic0/service_config b/src/test/test-basic0/service_config
new file mode 100644
index 0000000..0e05ab4
--- /dev/null
+++ b/src/test/test-basic0/service_config
@@ -0,0 +1,5 @@
+{
+ "vm:101": { "node": "node1", "state": "enabled" },
+ "vm:102": { "node": "node2" },
+ "vm:103": { "node": "node3", "state": "enabled" }
+}
\ No newline at end of file
diff --git a/src/test/test-basic0/service_stats b/src/test/test-basic0/service_stats
new file mode 100644
index 0000000..52f506d
--- /dev/null
+++ b/src/test/test-basic0/service_stats
@@ -0,0 +1,5 @@
+{
+ "101": { "cpu": 0.1, "maxcpu": 1,"mem": 1073741824,"maxmem": 1073741824 },
+ "102": { "cpu": 0.1, "maxcpu": 1,"mem": 1073741824,"maxmem": 1073741824 },
+ "103": { "cpu": 0.1, "maxcpu": 1,"mem": 1073741824,"maxmem": 1073741824 }
+}
\ No newline at end of file
--
2.30.2
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-21 15:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 15:13 [pve-devel] [PATCH V3 pve-ha-manager 0/2] POC/RFC: ressource aware HA manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 1/2] add ressource awareness manager Alexandre Derumier
2021-12-21 15:13 ` [pve-devel] [PATCH V3 pve-ha-manager 2/2] add test-basic0 Alexandre Derumier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox