public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
@ 2022-11-17 14:00 Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.

With this version, the effect is limited to choosing nodes during
recovery or by migrations triggered by a shutdown plolicy, but the
plan is to extend this in the future.

As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.

An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
   chosen node after each iteration. The current node can be kept if the
   score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.


The online node usage calculation is factored out into a 'Usage'
plugin system to ease adding the new static mode without much
cluttering. If not all nodes provide static service information, we
fall back to the 'basic' mode. If only the scoring fails, the service
count is used as a fallback.


Dependency bumps needed:
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.


Changes from v1:
    * Drop already applied patches.
    * Add tests for HA manager which also required properly adding
      relevant methods to the simulation environment.
    * Implement fallback for scoring in Usage/Static.pm.
    * Improve documentation and mention current limitation with many
      services.


ha-manager:

Fiona Ebner (15):
  env: add get_static_node_stats() method
  resources: add get_static_stats() method
  add Usage base plugin and Usage::Basic plugin
  manager: select service node: add $sid to parameters
  manager: online node usage: switch to Usage::Basic plugin
  usage: add Usage::Static plugin
  env: rename get_ha_settings to get_datacenter_settings
  env: datacenter config: include crs (cluster-resource-scheduling)
    setting
  manager: set resource scheduler mode upon init
  manager: use static resource scheduler when configured
  manager: avoid scoring nodes if maintenance fallback node is valid
  manager: avoid scoring nodes when not trying next and current node is
    valid
  usage: static: use service count on nodes as a fallback
  test: add tests for static resource scheduling
  resources: add missing PVE::Cluster use statements

 debian/pve-ha-manager.install                 |   3 +
 src/PVE/HA/Env.pm                             |  10 +-
 src/PVE/HA/Env/PVE2.pm                        |  27 ++-
 src/PVE/HA/LRM.pm                             |   4 +-
 src/PVE/HA/Makefile                           |   3 +-
 src/PVE/HA/Manager.pm                         |  79 +++++---
 src/PVE/HA/Resources.pm                       |   5 +
 src/PVE/HA/Resources/PVECT.pm                 |  13 ++
 src/PVE/HA/Resources/PVEVM.pm                 |  16 ++
 src/PVE/HA/Sim/Env.pm                         |  13 +-
 src/PVE/HA/Sim/Hardware.pm                    |  28 +++
 src/PVE/HA/Sim/Resources.pm                   |  10 +
 src/PVE/HA/Usage.pm                           |  50 +++++
 src/PVE/HA/Usage/Basic.pm                     |  52 ++++++
 src/PVE/HA/Usage/Makefile                     |   6 +
 src/PVE/HA/Usage/Static.pm                    | 120 ++++++++++++
 src/test/test-crs-static1/README              |   4 +
 src/test/test-crs-static1/cmdlist             |   4 +
 src/test/test-crs-static1/datacenter.cfg      |   6 +
 src/test/test-crs-static1/hardware_status     |   5 +
 src/test/test-crs-static1/log.expect          |  50 +++++
 src/test/test-crs-static1/manager_status      |   1 +
 src/test/test-crs-static1/service_config      |   3 +
 .../test-crs-static1/static_service_stats     |   3 +
 src/test/test-crs-static2/README              |   4 +
 src/test/test-crs-static2/cmdlist             |  20 ++
 src/test/test-crs-static2/datacenter.cfg      |   6 +
 src/test/test-crs-static2/groups              |   2 +
 src/test/test-crs-static2/hardware_status     |   7 +
 src/test/test-crs-static2/log.expect          | 171 ++++++++++++++++++
 src/test/test-crs-static2/manager_status      |   1 +
 src/test/test-crs-static2/service_config      |   3 +
 .../test-crs-static2/static_service_stats     |   3 +
 src/test/test-crs-static3/README              |   5 +
 src/test/test-crs-static3/cmdlist             |   4 +
 src/test/test-crs-static3/datacenter.cfg      |   9 +
 src/test/test-crs-static3/hardware_status     |   5 +
 src/test/test-crs-static3/log.expect          | 131 ++++++++++++++
 src/test/test-crs-static3/manager_status      |   1 +
 src/test/test-crs-static3/service_config      |  12 ++
 .../test-crs-static3/static_service_stats     |  12 ++
 src/test/test-crs-static4/README              |   6 +
 src/test/test-crs-static4/cmdlist             |   4 +
 src/test/test-crs-static4/datacenter.cfg      |   9 +
 src/test/test-crs-static4/hardware_status     |   5 +
 src/test/test-crs-static4/log.expect          | 149 +++++++++++++++
 src/test/test-crs-static4/manager_status      |   1 +
 src/test/test-crs-static4/service_config      |  12 ++
 .../test-crs-static4/static_service_stats     |  12 ++
 src/test/test-crs-static5/README              |   5 +
 src/test/test-crs-static5/cmdlist             |   4 +
 src/test/test-crs-static5/datacenter.cfg      |   9 +
 src/test/test-crs-static5/hardware_status     |   5 +
 src/test/test-crs-static5/log.expect          | 117 ++++++++++++
 src/test/test-crs-static5/manager_status      |   1 +
 src/test/test-crs-static5/service_config      |  10 +
 .../test-crs-static5/static_service_stats     |  11 ++
 src/test/test_failover1.pl                    |  21 ++-
 58 files changed, 1242 insertions(+), 50 deletions(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile
 create mode 100644 src/PVE/HA/Usage/Static.pm
 create mode 100644 src/test/test-crs-static1/README
 create mode 100644 src/test/test-crs-static1/cmdlist
 create mode 100644 src/test/test-crs-static1/datacenter.cfg
 create mode 100644 src/test/test-crs-static1/hardware_status
 create mode 100644 src/test/test-crs-static1/log.expect
 create mode 100644 src/test/test-crs-static1/manager_status
 create mode 100644 src/test/test-crs-static1/service_config
 create mode 100644 src/test/test-crs-static1/static_service_stats
 create mode 100644 src/test/test-crs-static2/README
 create mode 100644 src/test/test-crs-static2/cmdlist
 create mode 100644 src/test/test-crs-static2/datacenter.cfg
 create mode 100644 src/test/test-crs-static2/groups
 create mode 100644 src/test/test-crs-static2/hardware_status
 create mode 100644 src/test/test-crs-static2/log.expect
 create mode 100644 src/test/test-crs-static2/manager_status
 create mode 100644 src/test/test-crs-static2/service_config
 create mode 100644 src/test/test-crs-static2/static_service_stats
 create mode 100644 src/test/test-crs-static3/README
 create mode 100644 src/test/test-crs-static3/cmdlist
 create mode 100644 src/test/test-crs-static3/datacenter.cfg
 create mode 100644 src/test/test-crs-static3/hardware_status
 create mode 100644 src/test/test-crs-static3/log.expect
 create mode 100644 src/test/test-crs-static3/manager_status
 create mode 100644 src/test/test-crs-static3/service_config
 create mode 100644 src/test/test-crs-static3/static_service_stats
 create mode 100644 src/test/test-crs-static4/README
 create mode 100644 src/test/test-crs-static4/cmdlist
 create mode 100644 src/test/test-crs-static4/datacenter.cfg
 create mode 100644 src/test/test-crs-static4/hardware_status
 create mode 100644 src/test/test-crs-static4/log.expect
 create mode 100644 src/test/test-crs-static4/manager_status
 create mode 100644 src/test/test-crs-static4/service_config
 create mode 100644 src/test/test-crs-static4/static_service_stats
 create mode 100644 src/test/test-crs-static5/README
 create mode 100644 src/test/test-crs-static5/cmdlist
 create mode 100644 src/test/test-crs-static5/datacenter.cfg
 create mode 100644 src/test/test-crs-static5/hardware_status
 create mode 100644 src/test/test-crs-static5/log.expect
 create mode 100644 src/test/test-crs-static5/manager_status
 create mode 100644 src/test/test-crs-static5/service_config
 create mode 100644 src/test/test-crs-static5/static_service_stats


docs:

Fiona Ebner (2):
  ha: add section about scheduler modes
  ha: add warning against using 'static' mode with many services

 ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

to be used for static resource scheduling. In the simulation
environment, the information can be added in hardware_status.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Properly add it to the simulation environment.

 src/PVE/HA/Env.pm          |  6 ++++++
 src/PVE/HA/Env/PVE2.pm     | 13 +++++++++++++
 src/PVE/HA/Sim/Env.pm      |  6 ++++++
 src/PVE/HA/Sim/Hardware.pm | 13 +++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index ac569a9..00e3e3c 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -269,4 +269,10 @@ sub get_ha_settings {
     return $self->{plug}->get_ha_settings();
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    return $self->{plug}->get_static_node_stats();
+}
+
 1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 5e0a683..7cecf35 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -5,6 +5,7 @@ use warnings;
 use POSIX qw(:errno_h :fcntl_h);
 use IO::File;
 use IO::Socket::UNIX;
+use JSON;
 
 use PVE::SafeSyslog;
 use PVE::Tools;
@@ -459,4 +460,16 @@ sub get_ha_settings {
     return $datacenterconfig->{ha};
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    my $stats = PVE::Cluster::get_node_kv('static-info');
+    for my $node (keys $stats->%*) {
+	$stats->{$node} = eval { decode_json($stats->{$node}) };
+	$self->log('err', "unable to decode static node info for '$node' - $@") if $@;
+    }
+
+    return $stats;
+}
+
 1;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index b286708..6bd35b3 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -433,4 +433,10 @@ sub get_ha_settings {
     return $datacenterconfig->{ha};
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    return $self->{hardware}->get_static_node_stats();
+}
+
 1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 96a4064..e38561a 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -858,4 +858,17 @@ sub watchdog_update {
     return &$modify_watchog($self, $code);
 }
 
+sub get_static_node_stats {
+    my ($self) = @_;
+
+    my $cstatus = $self->read_hardware_status_nolock();
+
+    my $stats = {};
+    for my $node (keys $cstatus->%*) {
+	$stats->{$node} = { $cstatus->{$node}->%{qw(cpus memory)} };
+    }
+
+    return $stats;
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

to be used for static resource scheduling.

In container's vmstatus(), the 'cores' option takes precedence over
the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit'
here.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Properly add it to the simulation environment.

 src/PVE/HA/Resources.pm       |  5 +++++
 src/PVE/HA/Resources/PVECT.pm | 11 +++++++++++
 src/PVE/HA/Resources/PVEVM.pm | 14 ++++++++++++++
 src/PVE/HA/Sim/Hardware.pm    | 15 +++++++++++++++
 src/PVE/HA/Sim/Resources.pm   | 10 ++++++++++
 5 files changed, 55 insertions(+)

diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 835c314..7ba90f6 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -161,6 +161,11 @@ sub remove_locks {
     die "implement in subclass";
 }
 
+sub get_static_stats {
+    my ($class, $haenv, $id, $service_node) = @_;
+
+    die "implement in subclass";
+}
 
 # package PVE::HA::Resources::IPAddr;
 
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index 015faf3..4c9530d 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -150,4 +150,15 @@ sub remove_locks {
     return undef;
 }
 
+sub get_static_stats {
+    my ($class, $haenv, $id, $service_node) = @_;
+
+    my $conf = PVE::LXC::Config->load_config($id, $service_node);
+
+    return {
+	maxcpu => $conf->{cpulimit} || $conf->{cores} || 0,
+	maxmem => ($conf->{memory} || 512) * 1024 * 1024,
+    };
+}
+
 1;
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index 58c83e0..49e4a1d 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -173,4 +173,18 @@ sub remove_locks {
     return undef;
 }
 
+sub get_static_stats {
+    my ($class, $haenv, $id, $service_node) = @_;
+
+    my $conf = PVE::QemuConfig->load_config($id, $service_node);
+    my $defaults = PVE::QemuServer::load_defaults();
+
+    my $cpus = ($conf->{sockets} || $defaults->{sockets}) * ($conf->{cores} || $defaults->{cores});
+
+    return {
+	maxcpu => $conf->{vcpus} || $cpus,
+	maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,
+    };
+}
+
 1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index e38561a..e33a4c5 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -29,6 +29,7 @@ my $watchdog_timeout = 60;
 # $testdir/hardware_status            Hardware description (number of nodes, ...)
 # $testdir/manager_status             CRM status (start with {})
 # $testdir/service_config             Service configuration
+# $testdir/static_service_stats       Static service usage information (cpu, memory)
 # $testdir/groups                     HA groups configuration
 # $testdir/service_status_<node>      Service status
 # $testdir/datacenter.cfg             Datacenter wide HA configuration
@@ -38,6 +39,7 @@ my $watchdog_timeout = 60;
 #
 # $testdir/status/cluster_locks        Cluster locks
 # $testdir/status/hardware_status      Hardware status (power/network on/off)
+# $testdir/status/static_service_stats Static service usage information (cpu, memory)
 # $testdir/status/watchdog_status      Watchdog status
 #
 # runtime status
@@ -330,6 +332,15 @@ sub write_service_status {
     return $res;
 }
 
+sub read_static_service_stats {
+    my ($self) = @_;
+
+    my $filename = "$self->{statusdir}/static_service_stats";
+    my $stats = PVE::HA::Tools::read_json_from_file($filename);
+
+    return $stats;
+}
+
 my $default_group_config = <<__EOD;
 group: prefer_node1
     nodes node1
@@ -404,6 +415,10 @@ sub new {
 	copy("$testdir/datacenter.cfg", "$statusdir/datacenter.cfg");
     }
 
+    if (-f "$testdir/static_service_stats") {
+	copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
+    }
+
     my $cstatus = $self->read_hardware_status_nolock();
 
     foreach my $node (sort keys %$cstatus) {
diff --git a/src/PVE/HA/Sim/Resources.pm b/src/PVE/HA/Sim/Resources.pm
index bccc0e6..e6e1853 100644
--- a/src/PVE/HA/Sim/Resources.pm
+++ b/src/PVE/HA/Sim/Resources.pm
@@ -139,4 +139,14 @@ sub remove_locks {
     return undef;
 }
 
+sub get_static_stats {
+    my ($class, $haenv, $id, $service_node) = @_;
+
+    my $sid = $class->type() . ":$id";
+    my $hardware = $haenv->hardware();
+
+    my $stats = $hardware->read_static_service_stats();
+    return $stats->{$sid};
+}
+
 1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

in preparation to also support static resource scheduling via another
such Usage plugin.

The interface is designed in anticipation of the Usage::Static plugin,
the Usage::Basic plugin doesn't require all parameters.

In Usage::Static, the $haenv will necessary for logging and getting
the static node stats. add_service_usage_to_node() and
score_nodes_to_start_service() take the sid, service node and the
former also the optional migration target (during a migration it's not
clear whether the config file has already been moved or not) to be
able to get the static service stats.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 debian/pve-ha-manager.install |  2 ++
 src/PVE/HA/Makefile           |  3 +-
 src/PVE/HA/Usage.pm           | 49 +++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Basic.pm     | 52 +++++++++++++++++++++++++++++++++++
 src/PVE/HA/Usage/Makefile     |  6 ++++
 5 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 33a5c58..87fb24c 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,5 +33,7 @@
 /usr/share/perl5/PVE/HA/Resources/PVECT.pm
 /usr/share/perl5/PVE/HA/Resources/PVEVM.pm
 /usr/share/perl5/PVE/HA/Tools.pm
+/usr/share/perl5/PVE/HA/Usage.pm
+/usr/share/perl5/PVE/HA/Usage/Basic.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index c366f6c..8c91b97 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,5 +1,5 @@
 SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
-	NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm
+	NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
 
 SOURCES=${SIM_SOURCES} Config.pm
 
@@ -8,6 +8,7 @@ install:
 	install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
 	for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
 	make -C Resources install
+	make -C Usage install
 	make -C Env install
 
 .PHONY: installsim
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
new file mode 100644
index 0000000..4c723d1
--- /dev/null
+++ b/src/PVE/HA/Usage.pm
@@ -0,0 +1,49 @@
+package PVE::HA::Usage;
+
+use strict;
+use warnings;
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    die "implement in subclass";
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    die "implement in subclass";
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    die "implement in subclass";
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    die "implement in subclass";
+}
+
+# Returns a hash with $nodename => $score pairs. A lower $score is better.
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    die "implement in subclass";
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
new file mode 100644
index 0000000..f066350
--- /dev/null
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -0,0 +1,52 @@
+package PVE::HA::Usage::Basic;
+
+use strict;
+use warnings;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    return bless {
+	nodes => {},
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    $self->{nodes}->{$nodename} = 0;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    delete $self->{nodes}->{$nodename};
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return keys $self->{nodes}->%*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return defined($self->{nodes}->{$nodename});
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    $self->{nodes}->{$nodename}++;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    return $self->{nodes};
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
new file mode 100644
index 0000000..ccf1282
--- /dev/null
+++ b/src/PVE/HA/Usage/Makefile
@@ -0,0 +1,6 @@
+SOURCES=Basic.pm
+
+.PHONY: install
+install:
+	install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Usage
+	for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Usage/$$i; done
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (2 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

In preparation for scheduling based on static information, where the
scoring of nodes depends on information from the service's
VM/CT configuration file (and the $sid is required to query that).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 src/PVE/HA/Manager.pm      | 4 +++-
 src/test/test_failover1.pl | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 518f64f..63c94af 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -119,7 +119,7 @@ sub get_node_priority_groups {
 }
 
 sub select_service_node {
-    my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
+    my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
 
     my $group = get_service_group($groups, $online_node_usage, $service_conf);
 
@@ -766,6 +766,7 @@ sub next_state_started {
 	    my $node = select_service_node(
 	        $self->{groups},
 		$self->{online_node_usage},
+		$sid,
 		$cd,
 		$sd->{node},
 		$try_next,
@@ -847,6 +848,7 @@ sub next_state_recovery {
     my $recovery_node = select_service_node(
 	$self->{groups},
 	$self->{online_node_usage},
+	$sid,
 	$cd,
 	$sd->{node},
     );
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 67573a2..f11d1a6 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -30,7 +30,7 @@ sub test {
     my ($expected_node, $try_next) = @_;
     
     my $node = PVE::HA::Manager::select_service_node
-	($groups, $online_node_usage, $service_conf, $current_node, $try_next);
+	($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
 
     my (undef, undef, $line) = caller();
     die "unexpected result: $node != ${expected_node} at line $line\n" 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (3 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

no functional change is intended.

One test needs adaptation too, because it created its own version of
$online_node_usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 src/PVE/HA/Manager.pm      | 35 +++++++++++++++++------------------
 src/test/test_failover1.pl | 19 ++++++++++---------
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63c94af..63e6c8a 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -7,6 +7,7 @@ use Digest::MD5 qw(md5_base64);
 use PVE::Tools;
 use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
+use PVE::HA::Usage::Basic;
 
 ## Variable Name & Abbreviations Convention
 #
@@ -77,9 +78,7 @@ sub get_service_group {
 
     my $group = {};
     # add all online nodes to default group to allow try_next when no group set
-    foreach my $node (keys %$online_node_usage) {
-	$group->{nodes}->{$node} = 1;
-    }
+    $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
 
     # overwrite default if service is bound to a specific group
     if (my $group_id = $service_conf->{group}) {
@@ -100,7 +99,7 @@ sub get_node_priority_groups {
 	if ($entry =~ m/^(\S+):(\d+)$/) {
 	    ($node, $pri) = ($1, $2);
 	}
-	next if !defined($online_node_usage->{$node}); # offline
+	next if !$online_node_usage->contains_node($node); # offline
 	$pri_groups->{$pri}->{$node} = 1;
 	$group_members->{$node} = $pri;
     }
@@ -108,7 +107,7 @@ sub get_node_priority_groups {
     # add non-group members to unrestricted groups (priority -1)
     if (!$group->{restricted}) {
 	my $pri = -1;
-	foreach my $node (keys %$online_node_usage) {
+	for my $node ($online_node_usage->list_nodes()) {
 	    next if defined($group_members->{$node});
 	    $pri_groups->{$pri}->{$node} = 1;
 	    $group_members->{$node} = -1;
@@ -144,8 +143,9 @@ sub select_service_node {
 	}
     }
 
+    my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
-	$online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b
+	$scores->{$a} <=> $scores->{$b} || $a cmp $b
     } keys %{$pri_groups->{$top_pri}};
 
     my $found;
@@ -201,39 +201,38 @@ my $valid_service_states = {
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $online_node_usage = {};
+    my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
 
     my $online_nodes = $self->{ns}->list_online_nodes();
 
-    foreach my $node (@$online_nodes) {
-	$online_node_usage->{$node} = 0;
-    }
+    $online_node_usage->add_node($_) for $online_nodes->@*;
 
     foreach my $sid (keys %{$self->{ss}}) {
 	my $sd = $self->{ss}->{$sid};
 	my $state = $sd->{state};
 	my $target = $sd->{target}; # optional
-	if (defined($online_node_usage->{$sd->{node}})) {
+	if ($online_node_usage->contains_node($sd->{node})) {
 	    if (
 		$state eq 'started' || $state eq 'request_stop' || $state eq 'fence' ||
 		$state eq 'freeze' || $state eq 'error' || $state eq 'recovery'
 	    ) {
-		$online_node_usage->{$sd->{node}}++;
+		$online_node_usage->add_service_usage_to_node($sd->{node}, $sid, $sd->{node});
 	    } elsif (($state eq 'migrate') || ($state eq 'relocate')) {
+		my $source = $sd->{node};
 		# count it for both, source and target as load is put on both
-		$online_node_usage->{$sd->{node}}++;
-		$online_node_usage->{$target}++;
+		$online_node_usage->add_service_usage_to_node($source, $sid, $source, $target);
+		$online_node_usage->add_service_usage_to_node($target, $sid, $source, $target);
 	    } elsif ($state eq 'stopped') {
 		# do nothing
 	    } else {
 		die "should not be reached (sid = '$sid', state = '$state')";
 	    }
-	} elsif (defined($target) && defined($online_node_usage->{$target})) {
+	} elsif (defined($target) && $online_node_usage->contains_node($target)) {
 	    if ($state eq 'migrate' || $state eq 'relocate') {
 		# to correctly track maintenance modi and also consider the target as used for the
 		# case a node dies, as we cannot really know if the to-be-aborted incoming migration
 		# has already cleaned up all used resources
-		$online_node_usage->{$target}++;
+		$online_node_usage->add_service_usage_to_node($target, $sid, $sd->{node}, $target);
 	    }
 	}
     }
@@ -775,7 +774,7 @@ sub next_state_started {
 	    );
 
 	    if ($node && ($sd->{node} ne $node)) {
-		$self->{online_node_usage}->{$node}++;
+		$self->{online_node_usage}->add_service_usage_to_node($node, $sid, $sd->{node});
 
 		if (defined(my $fallback = $sd->{maintenance_node})) {
 		    if ($node eq $fallback) {
@@ -864,7 +863,7 @@ sub next_state_recovery {
 	$fence_recovery_cleanup->($self, $sid, $fenced_node);
 
 	$haenv->steal_service($sid, $sd->{node}, $recovery_node);
-	$self->{online_node_usage}->{$recovery_node}++;
+	$self->{online_node_usage}->add_service_usage_to_node($recovery_node, $sid, $recovery_node);
 
 	# NOTE: $sd *is normally read-only*, fencing is the exception
 	$cd->{node} = $sd->{node} = $recovery_node;
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index f11d1a6..308eab3 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -6,6 +6,7 @@ use warnings;
 use lib '..';
 use PVE::HA::Groups;
 use PVE::HA::Manager;
+use PVE::HA::Usage::Basic;
 
 my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
 group: prefer_node1
@@ -13,11 +14,11 @@ group: prefer_node1
 EOD
 
 
-my $online_node_usage = {
-    node1 => 0,
-    node2 => 0,
-    node3 => 0,
-};
+# Relies on the fact that the basic plugin doesn't use the haenv.
+my $online_node_usage = PVE::HA::Usage::Basic->new();
+$online_node_usage->add_node("node1");
+$online_node_usage->add_node("node2");
+$online_node_usage->add_node("node3");
 
 my $service_conf = {
     node => 'node1',
@@ -43,22 +44,22 @@ sub test {
 test('node1');
 test('node1', 1);
 
-delete $online_node_usage->{node1}; # poweroff
+$online_node_usage->remove_node("node1"); # poweroff
 
 test('node2');
 test('node3', 1);
 test('node2', 1);
 
-delete $online_node_usage->{node2}; # poweroff
+$online_node_usage->remove_node("node2"); # poweroff
 
 test('node3');
 test('node3', 1);
 
-$online_node_usage->{node1} = 0; # poweron
+$online_node_usage->add_node("node1"); # poweron
 
 test('node1');
 
-$online_node_usage->{node2} = 0; # poweron
+$online_node_usage->add_node("node2"); # poweron
 
 test('node1');
 test('node1', 1);
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (4 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

for calculating node usage of services based upon static CPU and
memory configuration as well as scoring the nodes with that
information to decide where to start a new or recovered service.

For getting the service stats, it's necessary to also consider the
migration target (if present), becuase the configuration file might
have already moved.

It's necessary to update the cluster filesystem upon stealing the
service to be able to always read the moved config right away when
adding the usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Pass haenv to resource's get_static_stats(), required by
      simulation env.

 debian/pve-ha-manager.install |   1 +
 src/PVE/HA/Env/PVE2.pm        |   4 ++
 src/PVE/HA/Usage.pm           |   1 +
 src/PVE/HA/Usage/Makefile     |   2 +-
 src/PVE/HA/Usage/Static.pm    | 114 ++++++++++++++++++++++++++++++++++
 5 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 src/PVE/HA/Usage/Static.pm

diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 87fb24c..a7598a9 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -35,5 +35,6 @@
 /usr/share/perl5/PVE/HA/Tools.pm
 /usr/share/perl5/PVE/HA/Usage.pm
 /usr/share/perl5/PVE/HA/Usage/Basic.pm
+/usr/share/perl5/PVE/HA/Usage/Static.pm
 /usr/share/perl5/PVE/Service/pve_ha_crm.pm
 /usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7cecf35..7fac43c 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -176,6 +176,10 @@ sub steal_service {
     } else {
 	die "implement me";
     }
+
+    # Necessary for (at least) static usage plugin to always be able to read service config from new
+    # node right away.
+    $self->cluster_state_update();
 }
 
 sub read_group_config {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 4c723d1..66d9572 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,6 +33,7 @@ sub contains_node {
     die "implement in subclass";
 }
 
+# Logs a warning to $haenv upon failure, but does not die.
 sub add_service_usage_to_node {
     my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
 
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index ccf1282..5a51359 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Basic.pm
+SOURCES=Basic.pm Static.pm
 
 .PHONY: install
 install:
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
new file mode 100644
index 0000000..ce705eb
--- /dev/null
+++ b/src/PVE/HA/Usage/Static.pm
@@ -0,0 +1,114 @@
+package PVE::HA::Usage::Static;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Static;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+    my ($class, $haenv) = @_;
+
+    my $node_stats = eval { $haenv->get_static_node_stats() };
+    die "did not get static node usage information - $@" if $@;
+
+    my $scheduler = eval { PVE::RS::ResourceScheduling::Static->new(); };
+    die "unable to initialize static scheduling - $@" if $@;
+
+    return bless {
+	'node-stats' => $node_stats,
+	'service-stats' => {},
+	haenv => $haenv,
+	scheduler => $scheduler,
+    }, $class;
+}
+
+sub add_node {
+    my ($self, $nodename) = @_;
+
+    my $stats = $self->{'node-stats'}->{$nodename}
+	or die "did not get static node usage information for '$nodename'\n";
+    die "static node usage information for '$nodename' missing cpu count\n" if !$stats->{cpus};
+    die "static node usage information for '$nodename' missing memory\n" if !$stats->{memory};
+
+    eval { $self->{scheduler}->add_node($nodename, int($stats->{cpus}), int($stats->{memory})); };
+    die "initializing static node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+    my ($self, $nodename) = @_;
+
+    $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+    my ($self) = @_;
+
+    return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+    my ($self, $nodename) = @_;
+
+    return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+    my ($self, $sid, $service_node, $migration_target) = @_;
+
+    return $self->{'service-stats'}->{$sid} if $self->{'service-stats'}->{$sid};
+
+    my (undef, $type, $id) = $self->{haenv}->parse_sid($sid);
+    my $plugin = PVE::HA::Resources->lookup($type);
+
+    my $stats = eval { $plugin->get_static_stats($self->{haenv}, $id, $service_node); };
+    if (my $err = $@) {
+	# config might've already moved during a migration
+	$stats = eval { $plugin->get_static_stats($self->{haenv}, $id, $migration_target); } if $migration_target;
+	die "did not get static service usage information for '$sid' - $err\n" if !$stats;
+    }
+
+    my $service_stats = {
+	maxcpu => $stats->{maxcpu} + 0.0, # containers allow non-integer cpulimit
+	maxmem => int($stats->{maxmem}),
+    };
+
+    $self->{'service-stats'}->{$sid} = $service_stats;
+
+    return $service_stats;
+}
+
+sub add_service_usage_to_node {
+    my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+    eval {
+	my $service_usage = get_service_usage($self, $sid, $service_node, $migration_target);
+	$self->{scheduler}->add_service_usage_to_node($nodename, $service_usage);
+    };
+    $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
+	if $@;
+}
+
+sub score_nodes_to_start_service {
+    my ($self, $sid, $service_node) = @_;
+
+    my $score_list = eval {
+	my $service_usage = get_service_usage($self, $sid, $service_node);
+	$self->{scheduler}->score_nodes_to_start_service($service_usage);
+    };
+    if (my $err = $@) {
+	$self->{haenv}->log(
+	    'err',
+	    "unable to score nodes according to static usage for service '$sid' - $err",
+	);
+	# TODO maybe use service count as fallback?
+	return { map { $_ => 1 } $self->list_nodes() };
+    }
+
+    # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+    return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (5 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

The method will be extended to include other HA-relevant settings from
datacenter.cfg.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/PVE/HA/Env.pm      | 4 ++--
 src/PVE/HA/Env/PVE2.pm | 2 +-
 src/PVE/HA/LRM.pm      | 2 +-
 src/PVE/HA/Sim/Env.pm  | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 00e3e3c..16603ec 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -263,10 +263,10 @@ sub get_max_workers {
 }
 
 # return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
     my ($self) = @_;
 
-    return $self->{plug}->get_ha_settings();
+    return $self->{plug}->get_datacenter_settings();
 }
 
 sub get_static_node_stats {
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7fac43c..d2c46e8 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -452,7 +452,7 @@ sub get_max_workers {
 }
 
 # return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
     my ($self) = @_;
 
     my $datacenterconfig = eval { cfs_read_file('datacenter.cfg') };
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 8cbdb82..7750f4d 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -59,7 +59,7 @@ sub shutdown_request {
 
     my ($shutdown, $reboot) = $haenv->is_node_shutdown();
 
-    my $dc_ha_cfg = $haenv->get_ha_settings();
+    my $dc_ha_cfg = $haenv->get_datacenter_settings();
     my $shutdown_policy = $dc_ha_cfg->{shutdown_policy} // 'conditional';
 
     if ($shutdown) { # don't log this on service restart, only on node shutdown
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 6bd35b3..6c47030 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -425,7 +425,7 @@ sub get_max_workers {
 }
 
 # return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
     my ($self) = @_;
 
     my $datacenterconfig = $self->{hardware}->read_datacenter_conf();
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (6 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Extend existing method rather than introducing a new one.

 src/PVE/HA/Env/PVE2.pm | 10 +++++-----
 src/PVE/HA/LRM.pm      |  4 ++--
 src/PVE/HA/Sim/Env.pm  |  5 ++++-
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index d2c46e8..f6ebfeb 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -456,12 +456,12 @@ sub get_datacenter_settings {
     my ($self) = @_;
 
     my $datacenterconfig = eval { cfs_read_file('datacenter.cfg') };
-    if (my $err = $@) {
-	$self->log('err', "unable to get HA settings from datacenter.cfg - $err");
-	return {};
-    }
+    $self->log('err', "unable to get HA settings from datacenter.cfg - $@") if $@;
 
-    return $datacenterconfig->{ha};
+    return {
+	ha => $datacenterconfig->{ha} // {},
+	crs => $datacenterconfig->{crs} // {},
+    };
 }
 
 sub get_static_node_stats {
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 7750f4d..5d2fa2c 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -59,8 +59,8 @@ sub shutdown_request {
 
     my ($shutdown, $reboot) = $haenv->is_node_shutdown();
 
-    my $dc_ha_cfg = $haenv->get_datacenter_settings();
-    my $shutdown_policy = $dc_ha_cfg->{shutdown_policy} // 'conditional';
+    my $dc_cfg = $haenv->get_datacenter_settings();
+    my $shutdown_policy = $dc_cfg->{ha}->{shutdown_policy} // 'conditional';
 
     if ($shutdown) { # don't log this on service restart, only on node shutdown
 	$haenv->log('info', "got shutdown request with shutdown policy '$shutdown_policy'");
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 6c47030..c6ea73c 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -430,7 +430,10 @@ sub get_datacenter_settings {
 
     my $datacenterconfig = $self->{hardware}->read_datacenter_conf();
 
-    return $datacenterconfig->{ha};
+    return {
+	ha => $datacenterconfig->{ha} // {},
+	crs => $datacenterconfig->{crs} // {},
+    };
 }
 
 sub get_static_node_stats {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (7 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Switch to get_datacenter_settings() replacing the previous
      get_crs_settings() in v1.

 src/PVE/HA/Manager.pm | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63e6c8a..1638442 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -52,6 +52,11 @@ sub new {
 
     $self->{ms} = { master_node => $haenv->nodename() };
 
+    my $dc_cfg = $haenv->get_datacenter_settings();
+    $self->{'scheduler-mode'} = $dc_cfg->{crs}->{ha} ? $dc_cfg->{crs}->{ha} : 'basic';
+    $haenv->log('info', "using scheduler mode '$self->{'scheduler-mode'}'")
+	if $self->{'scheduler-mode'} ne 'basic';
+
     return $self;
 }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (8 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Note that recompute_online_node_usage() becomes much slower when the
'static' resource scheduler mode is used. Tested it with ~300 HA
services (minimal containers) running on my virtual test cluster.

Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds

Combined with the fact that recompute_online_node_usage() is currently
called very often this can lead to a lot of delay during recovery
situations with hundreds of services and low thousands of services
overall and with genereous estimates even run into the watchdog timer.

Ideas to remedy this is using PVE::Cluster's
get_guest_config_properties() instead of load_config() and/or
optimizing how often recompute_online_node_usage() is called.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Add fixme note about overhead.
    * Add benchmark results to commit message.

 src/PVE/HA/Manager.pm | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 1638442..7f1d1d7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,7 @@ use PVE::Tools;
 use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
 use PVE::HA::Usage::Basic;
+use PVE::HA::Usage::Static;
 
 ## Variable Name & Abbreviations Convention
 #
@@ -203,14 +204,35 @@ my $valid_service_states = {
     error => 1,
 };
 
+# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact
+# that this function is called for each state change and upon recovery doesn't help.
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
+    my $haenv = $self->{haenv};
 
     my $online_nodes = $self->{ns}->list_online_nodes();
 
-    $online_node_usage->add_node($_) for $online_nodes->@*;
+    my $online_node_usage;
+
+    if (my $mode = $self->{'scheduler-mode'}) {
+	if ($mode eq 'static') {
+	    $online_node_usage = eval {
+		my $scheduler = PVE::HA::Usage::Static->new($haenv);
+		$scheduler->add_node($_) for $online_nodes->@*;
+		return $scheduler;
+	    };
+	    $haenv->log('warning', "using 'basic' scheduler mode, init for 'static' failed - $@")
+		if $@;
+	} elsif ($mode ne 'basic') {
+	    $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'");
+	}
+    }
+
+    if (!$online_node_usage) {
+	$online_node_usage = PVE::HA::Usage::Basic->new($haenv);
+	$online_node_usage->add_node($_) for $online_nodes->@*;
+    }
 
     foreach my $sid (keys %{$self->{ss}}) {
 	my $sd = $self->{ss}->{$sid};
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (9 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 src/PVE/HA/Manager.pm | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 7f1d1d7..cc2ada4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,25 +149,20 @@ sub select_service_node {
 	}
     }
 
+    return $maintenance_fallback
+	if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+
     my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
 	$scores->{$a} <=> $scores->{$b} || $a cmp $b
     } keys %{$pri_groups->{$top_pri}};
 
     my $found;
-    my $found_maintenance_fallback;
     for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
 	my $node = $nodes[$i];
 	if ($node eq $current_node) {
 	    $found = $i;
 	}
-	if (defined($maintenance_fallback) && $node eq $maintenance_fallback) {
-	    $found_maintenance_fallback = $i;
-	}
-    }
-
-    if (defined($found_maintenance_fallback)) {
-	return $nodes[$found_maintenance_fallback];
     }
 
     if ($try_next) {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current node is valid
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (10 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

This should cover most calls of select_service_node().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes from v1.

 src/PVE/HA/Manager.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index cc2ada4..69bfbc3 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -152,6 +152,8 @@ sub select_service_node {
     return $maintenance_fallback
 	if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
 
+    return $current_node if !$try_next && $pri_groups->{$top_pri}->{$current_node};
+
     my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
     my @nodes = sort {
 	$scores->{$a} <=> $scores->{$b} || $a cmp $b
@@ -171,8 +173,6 @@ sub select_service_node {
 	} else {
 	    return $nodes[0];
 	}
-    } elsif (defined($found)) {
-	return $nodes[$found];
     } else {
 	return $nodes[0];
     }
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (11 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

if something goes wrong with the TOPSIS scoring. Not expected to
happen, but it's rather cheap to be on the safe side.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/PVE/HA/Usage/Static.pm | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index ce705eb..73ce836 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -22,12 +22,15 @@ sub new {
 	'service-stats' => {},
 	haenv => $haenv,
 	scheduler => $scheduler,
+	'service-counts' => {}, # Service count on each node. Fallback if scoring calculation fails.
     }, $class;
 }
 
 sub add_node {
     my ($self, $nodename) = @_;
 
+    $self->{'service-counts'}->{$nodename} = 0;
+
     my $stats = $self->{'node-stats'}->{$nodename}
 	or die "did not get static node usage information for '$nodename'\n";
     die "static node usage information for '$nodename' missing cpu count\n" if !$stats->{cpus};
@@ -40,6 +43,8 @@ sub add_node {
 sub remove_node {
     my ($self, $nodename) = @_;
 
+    delete $self->{'service-counts'}->{$nodename};
+
     $self->{scheduler}->remove_node($nodename);
 }
 
@@ -83,6 +88,8 @@ my sub get_service_usage {
 sub add_service_usage_to_node {
     my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
 
+    $self->{'service-counts'}->{$nodename}++;
+
     eval {
 	my $service_usage = get_service_usage($self, $sid, $service_node, $migration_target);
 	$self->{scheduler}->add_service_usage_to_node($nodename, $service_usage);
@@ -103,8 +110,7 @@ sub score_nodes_to_start_service {
 	    'err',
 	    "unable to score nodes according to static usage for service '$sid' - $err",
 	);
-	# TODO maybe use service count as fallback?
-	return { map { $_ => 1 } $self->list_nodes() };
+	return $self->{'service-counts'};
     }
 
     # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (12 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

See the READMEs for more information about the tests.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/test/test-crs-static1/README              |   4 +
 src/test/test-crs-static1/cmdlist             |   4 +
 src/test/test-crs-static1/datacenter.cfg      |   6 +
 src/test/test-crs-static1/hardware_status     |   5 +
 src/test/test-crs-static1/log.expect          |  50 +++++
 src/test/test-crs-static1/manager_status      |   1 +
 src/test/test-crs-static1/service_config      |   3 +
 .../test-crs-static1/static_service_stats     |   3 +
 src/test/test-crs-static2/README              |   4 +
 src/test/test-crs-static2/cmdlist             |  20 ++
 src/test/test-crs-static2/datacenter.cfg      |   6 +
 src/test/test-crs-static2/groups              |   2 +
 src/test/test-crs-static2/hardware_status     |   7 +
 src/test/test-crs-static2/log.expect          | 171 ++++++++++++++++++
 src/test/test-crs-static2/manager_status      |   1 +
 src/test/test-crs-static2/service_config      |   3 +
 .../test-crs-static2/static_service_stats     |   3 +
 src/test/test-crs-static3/README              |   5 +
 src/test/test-crs-static3/cmdlist             |   4 +
 src/test/test-crs-static3/datacenter.cfg      |   9 +
 src/test/test-crs-static3/hardware_status     |   5 +
 src/test/test-crs-static3/log.expect          | 131 ++++++++++++++
 src/test/test-crs-static3/manager_status      |   1 +
 src/test/test-crs-static3/service_config      |  12 ++
 .../test-crs-static3/static_service_stats     |  12 ++
 src/test/test-crs-static4/README              |   6 +
 src/test/test-crs-static4/cmdlist             |   4 +
 src/test/test-crs-static4/datacenter.cfg      |   9 +
 src/test/test-crs-static4/hardware_status     |   5 +
 src/test/test-crs-static4/log.expect          | 149 +++++++++++++++
 src/test/test-crs-static4/manager_status      |   1 +
 src/test/test-crs-static4/service_config      |  12 ++
 .../test-crs-static4/static_service_stats     |  12 ++
 src/test/test-crs-static5/README              |   5 +
 src/test/test-crs-static5/cmdlist             |   4 +
 src/test/test-crs-static5/datacenter.cfg      |   9 +
 src/test/test-crs-static5/hardware_status     |   5 +
 src/test/test-crs-static5/log.expect          | 117 ++++++++++++
 src/test/test-crs-static5/manager_status      |   1 +
 src/test/test-crs-static5/service_config      |  10 +
 .../test-crs-static5/static_service_stats     |  11 ++
 41 files changed, 832 insertions(+)
 create mode 100644 src/test/test-crs-static1/README
 create mode 100644 src/test/test-crs-static1/cmdlist
 create mode 100644 src/test/test-crs-static1/datacenter.cfg
 create mode 100644 src/test/test-crs-static1/hardware_status
 create mode 100644 src/test/test-crs-static1/log.expect
 create mode 100644 src/test/test-crs-static1/manager_status
 create mode 100644 src/test/test-crs-static1/service_config
 create mode 100644 src/test/test-crs-static1/static_service_stats
 create mode 100644 src/test/test-crs-static2/README
 create mode 100644 src/test/test-crs-static2/cmdlist
 create mode 100644 src/test/test-crs-static2/datacenter.cfg
 create mode 100644 src/test/test-crs-static2/groups
 create mode 100644 src/test/test-crs-static2/hardware_status
 create mode 100644 src/test/test-crs-static2/log.expect
 create mode 100644 src/test/test-crs-static2/manager_status
 create mode 100644 src/test/test-crs-static2/service_config
 create mode 100644 src/test/test-crs-static2/static_service_stats
 create mode 100644 src/test/test-crs-static3/README
 create mode 100644 src/test/test-crs-static3/cmdlist
 create mode 100644 src/test/test-crs-static3/datacenter.cfg
 create mode 100644 src/test/test-crs-static3/hardware_status
 create mode 100644 src/test/test-crs-static3/log.expect
 create mode 100644 src/test/test-crs-static3/manager_status
 create mode 100644 src/test/test-crs-static3/service_config
 create mode 100644 src/test/test-crs-static3/static_service_stats
 create mode 100644 src/test/test-crs-static4/README
 create mode 100644 src/test/test-crs-static4/cmdlist
 create mode 100644 src/test/test-crs-static4/datacenter.cfg
 create mode 100644 src/test/test-crs-static4/hardware_status
 create mode 100644 src/test/test-crs-static4/log.expect
 create mode 100644 src/test/test-crs-static4/manager_status
 create mode 100644 src/test/test-crs-static4/service_config
 create mode 100644 src/test/test-crs-static4/static_service_stats
 create mode 100644 src/test/test-crs-static5/README
 create mode 100644 src/test/test-crs-static5/cmdlist
 create mode 100644 src/test/test-crs-static5/datacenter.cfg
 create mode 100644 src/test/test-crs-static5/hardware_status
 create mode 100644 src/test/test-crs-static5/log.expect
 create mode 100644 src/test/test-crs-static5/manager_status
 create mode 100644 src/test/test-crs-static5/service_config
 create mode 100644 src/test/test-crs-static5/static_service_stats

diff --git a/src/test/test-crs-static1/README b/src/test/test-crs-static1/README
new file mode 100644
index 0000000..483f265
--- /dev/null
+++ b/src/test/test-crs-static1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with the 'static' resource scheduling mode.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-static1/cmdlist b/src/test/test-crs-static1/cmdlist
new file mode 100644
index 0000000..8684073
--- /dev/null
+++ b/src/test/test-crs-static1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-static1/datacenter.cfg b/src/test/test-crs-static1/datacenter.cfg
new file mode 100644
index 0000000..8f83457
--- /dev/null
+++ b/src/test/test-crs-static1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+    "crs": {
+        "ha": "static"
+    }
+}
+
diff --git a/src/test/test-crs-static1/hardware_status b/src/test/test-crs-static1/hardware_status
new file mode 100644
index 0000000..0fa8c26
--- /dev/null
+++ b/src/test/test-crs-static1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 200000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 300000000000 }
+}
diff --git a/src/test/test-crs-static1/log.expect b/src/test/test-crs-static1/log.expect
new file mode 100644
index 0000000..2b06b3c
--- /dev/null
+++ b/src/test/test-crs-static1/log.expect
@@ -0,0 +1,50 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute network node1 off
+info    120    node1/crm: status change master => lost_manager_lock
+info    120    node1/crm: status change lost_manager_lock => wait_for_quorum
+info    121    node1/lrm: status change active => lost_agent_lock
+info    162     watchdog: execute power node1 off
+info    161    node1/crm: killed by poweroff
+info    162    node1/lrm: killed by poweroff
+info    162     hardware: server 'node1' stopped by poweroff (watchdog)
+info    222    node3/crm: got lock 'ha_manager_lock'
+info    222    node3/crm: status change slave => master
+info    222    node3/crm: using scheduler mode 'static'
+info    222    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info    282    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    282    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai    282    node3/crm: FENCE: Try to fence node 'node1'
+info    282    node3/crm: got lock 'ha_agent_node1_lock'
+info    282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai    282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info    282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    283    node3/lrm: got lock 'ha_agent_node3_lock'
+info    283    node3/lrm: status change wait_for_agent_lock => active
+info    283    node3/lrm: starting service vm:102
+info    283    node3/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-static1/manager_status b/src/test/test-crs-static1/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static1/service_config b/src/test/test-crs-static1/service_config
new file mode 100644
index 0000000..9c12447
--- /dev/null
+++ b/src/test/test-crs-static1/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static1/static_service_stats b/src/test/test-crs-static1/static_service_stats
new file mode 100644
index 0000000..7fb992d
--- /dev/null
+++ b/src/test/test-crs-static1/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static2/README b/src/test/test-crs-static2/README
new file mode 100644
index 0000000..61530a7
--- /dev/null
+++ b/src/test/test-crs-static2/README
@@ -0,0 +1,4 @@
+Test how service recovery works with the 'static' resource scheduling mode.
+
+Expect that the single service always gets recovered to the node with the most
+available resources. Also tests that the group priority still takes precedence.
diff --git a/src/test/test-crs-static2/cmdlist b/src/test/test-crs-static2/cmdlist
new file mode 100644
index 0000000..bada1bb
--- /dev/null
+++ b/src/test/test-crs-static2/cmdlist
@@ -0,0 +1,20 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "power node5 on" ],
+    [ "power node1 off" ],
+    [ "delay 300" ],
+    [ "power node1 on" ],
+    [ "delay 300" ],
+    [ "power node4 on" ],
+    [ "power node1 off" ],
+    [ "delay 300" ],
+    [ "power node1 on" ],
+    [ "delay 300" ],
+    [ "power node2 off" ],
+    [ "power node1 off" ],
+    [ "delay 300" ],
+    [ "power node1 on" ],
+    [ "delay 300" ],
+    [ "power node2 on" ],
+    [ "power node3 off" ],
+    [ "power node1 off" ]
+]
diff --git a/src/test/test-crs-static2/datacenter.cfg b/src/test/test-crs-static2/datacenter.cfg
new file mode 100644
index 0000000..8f83457
--- /dev/null
+++ b/src/test/test-crs-static2/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+    "crs": {
+        "ha": "static"
+    }
+}
+
diff --git a/src/test/test-crs-static2/groups b/src/test/test-crs-static2/groups
new file mode 100644
index 0000000..43e9bf5
--- /dev/null
+++ b/src/test/test-crs-static2/groups
@@ -0,0 +1,2 @@
+group: prefer_node1
+        nodes node1
diff --git a/src/test/test-crs-static2/hardware_status b/src/test/test-crs-static2/hardware_status
new file mode 100644
index 0000000..d426023
--- /dev/null
+++ b/src/test/test-crs-static2/hardware_status
@@ -0,0 +1,7 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 200000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 300000000000 },
+  "node4": { "power": "off", "network": "off", "cpus": 64, "memory": 300000000000 },
+  "node5": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static2/log.expect b/src/test/test-crs-static2/log.expect
new file mode 100644
index 0000000..ee4416c
--- /dev/null
+++ b/src/test/test-crs-static2/log.expect
@@ -0,0 +1,171 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node5 on
+info     20    node5/crm: status change startup => wait_for_quorum
+info     20    node5/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     26    node5/crm: status change wait_for_quorum => slave
+info    120      cmdlist: execute power node1 off
+info    120    node1/crm: killed by poweroff
+info    120    node1/lrm: killed by poweroff
+info    220      cmdlist: execute delay 300
+info    222    node3/crm: got lock 'ha_manager_lock'
+info    222    node3/crm: status change slave => master
+info    222    node3/crm: using scheduler mode 'static'
+info    222    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info    282    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    282    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai    282    node3/crm: FENCE: Try to fence node 'node1'
+info    282    node3/crm: got lock 'ha_agent_node1_lock'
+info    282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai    282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info    282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info    282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node3)
+info    283    node3/lrm: got lock 'ha_agent_node3_lock'
+info    283    node3/lrm: status change wait_for_agent_lock => active
+info    283    node3/lrm: starting service vm:102
+info    283    node3/lrm: service status vm:102 started
+info    600      cmdlist: execute power node1 on
+info    600    node1/crm: status change startup => wait_for_quorum
+info    600    node1/lrm: status change startup => wait_for_agent_lock
+info    600    node1/crm: status change wait_for_quorum => slave
+info    604    node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info    604    node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info    604    node3/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node3, target = node1)
+info    605    node3/lrm: service vm:102 - start migrate to node 'node1'
+info    605    node3/lrm: service vm:102 - end migrate to node 'node1'
+info    624    node3/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info    641    node1/lrm: got lock 'ha_agent_node1_lock'
+info    641    node1/lrm: status change wait_for_agent_lock => active
+info    641    node1/lrm: starting service vm:102
+info    641    node1/lrm: service status vm:102 started
+info    700      cmdlist: execute delay 300
+info   1080      cmdlist: execute power node4 on
+info   1080    node4/crm: status change startup => wait_for_quorum
+info   1080    node4/lrm: status change startup => wait_for_agent_lock
+info   1084    node3/crm: node 'node4': state changed from 'unknown' => 'online'
+info   1086    node4/crm: status change wait_for_quorum => slave
+info   1180      cmdlist: execute power node1 off
+info   1180    node1/crm: killed by poweroff
+info   1180    node1/lrm: killed by poweroff
+info   1182    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info   1242    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info   1242    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai   1242    node3/crm: FENCE: Try to fence node 'node1'
+info   1280      cmdlist: execute delay 300
+info   1282    node3/crm: got lock 'ha_agent_node1_lock'
+info   1282    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info   1282    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai   1282    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info   1282    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info   1282    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info   1282    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node4)
+info   1285    node4/lrm: got lock 'ha_agent_node4_lock'
+info   1285    node4/lrm: status change wait_for_agent_lock => active
+info   1285    node4/lrm: starting service vm:102
+info   1285    node4/lrm: service status vm:102 started
+info   1660      cmdlist: execute power node1 on
+info   1660    node1/crm: status change startup => wait_for_quorum
+info   1660    node1/lrm: status change startup => wait_for_agent_lock
+info   1660    node1/crm: status change wait_for_quorum => slave
+info   1664    node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info   1664    node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info   1664    node3/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node4, target = node1)
+info   1667    node4/lrm: service vm:102 - start migrate to node 'node1'
+info   1667    node4/lrm: service vm:102 - end migrate to node 'node1'
+info   1684    node3/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info   1701    node1/lrm: got lock 'ha_agent_node1_lock'
+info   1701    node1/lrm: status change wait_for_agent_lock => active
+info   1701    node1/lrm: starting service vm:102
+info   1701    node1/lrm: service status vm:102 started
+info   1760      cmdlist: execute delay 300
+info   1825    node3/lrm: node had no service configured for 60 rounds, going idle.
+info   1825    node3/lrm: status change active => wait_for_agent_lock
+info   2140      cmdlist: execute power node2 off
+info   2140    node2/crm: killed by poweroff
+info   2140    node2/lrm: killed by poweroff
+info   2142    node3/crm: node 'node2': state changed from 'online' => 'unknown'
+info   2240      cmdlist: execute power node1 off
+info   2240    node1/crm: killed by poweroff
+info   2240    node1/lrm: killed by poweroff
+info   2240    node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info   2300    node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info   2300    node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai   2300    node3/crm: FENCE: Try to fence node 'node1'
+info   2340      cmdlist: execute delay 300
+info   2360    node3/crm: got lock 'ha_agent_node1_lock'
+info   2360    node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info   2360    node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai   2360    node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info   2360    node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info   2360    node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info   2360    node3/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node4)
+info   2363    node4/lrm: starting service vm:102
+info   2363    node4/lrm: service status vm:102 started
+info   2720      cmdlist: execute power node1 on
+info   2720    node1/crm: status change startup => wait_for_quorum
+info   2720    node1/lrm: status change startup => wait_for_agent_lock
+info   2720    node1/crm: status change wait_for_quorum => slave
+info   2722    node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info   2722    node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info   2722    node3/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node4, target = node1)
+info   2725    node4/lrm: service vm:102 - start migrate to node 'node1'
+info   2725    node4/lrm: service vm:102 - end migrate to node 'node1'
+info   2742    node3/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node1)
+info   2761    node1/lrm: got lock 'ha_agent_node1_lock'
+info   2761    node1/lrm: status change wait_for_agent_lock => active
+info   2761    node1/lrm: starting service vm:102
+info   2761    node1/lrm: service status vm:102 started
+info   2820      cmdlist: execute delay 300
+info   3200      cmdlist: execute power node2 on
+info   3200    node2/crm: status change startup => wait_for_quorum
+info   3200    node2/lrm: status change startup => wait_for_agent_lock
+info   3202    node2/crm: status change wait_for_quorum => slave
+info   3204    node3/crm: node 'node2': state changed from 'unknown' => 'online'
+info   3300      cmdlist: execute power node3 off
+info   3300    node3/crm: killed by poweroff
+info   3300    node3/lrm: killed by poweroff
+info   3400      cmdlist: execute power node1 off
+info   3400    node1/crm: killed by poweroff
+info   3400    node1/lrm: killed by poweroff
+info   3420    node2/crm: got lock 'ha_manager_lock'
+info   3420    node2/crm: status change slave => master
+info   3420    node2/crm: using scheduler mode 'static'
+info   3420    node2/crm: node 'node1': state changed from 'online' => 'unknown'
+info   3420    node2/crm: node 'node3': state changed from 'online' => 'unknown'
+info   3480    node2/crm: service 'vm:102': state changed from 'started' to 'fence'
+info   3480    node2/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai   3480    node2/crm: FENCE: Try to fence node 'node1'
+info   3520    node2/crm: got lock 'ha_agent_node1_lock'
+info   3520    node2/crm: fencing: acknowledged - got agent lock for node 'node1'
+info   3520    node2/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai   3520    node2/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info   3520    node2/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info   3520    node2/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info   3520    node2/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node4)
+info   3523    node4/lrm: starting service vm:102
+info   3523    node4/lrm: service status vm:102 started
+info   4000     hardware: exit simulation - done
diff --git a/src/test/test-crs-static2/manager_status b/src/test/test-crs-static2/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static2/service_config b/src/test/test-crs-static2/service_config
new file mode 100644
index 0000000..1f2333d
--- /dev/null
+++ b/src/test/test-crs-static2/service_config
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "node": "node1", "state": "enabled", "group": "prefer_node1" }
+}
diff --git a/src/test/test-crs-static2/static_service_stats b/src/test/test-crs-static2/static_service_stats
new file mode 100644
index 0000000..7fb992d
--- /dev/null
+++ b/src/test/test-crs-static2/static_service_stats
@@ -0,0 +1,3 @@
+{
+    "vm:102": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static3/README b/src/test/test-crs-static3/README
new file mode 100644
index 0000000..db929e1
--- /dev/null
+++ b/src/test/test-crs-static3/README
@@ -0,0 +1,5 @@
+Test how shutdown migrate policy works with the 'static' resource scheduling
+mode.
+
+Expect that when node1 is shut down the services get migrated in the repeating
+sequence node2 node2 node3, because node 2 has twice the resources of node 3.
diff --git a/src/test/test-crs-static3/cmdlist b/src/test/test-crs-static3/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static3/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static3/datacenter.cfg b/src/test/test-crs-static3/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static3/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "static"
+    },
+    "ha": {
+        "shutdown_policy": "migrate"
+    }
+}
+
diff --git a/src/test/test-crs-static3/hardware_status b/src/test/test-crs-static3/hardware_status
new file mode 100644
index 0000000..dfbf496
--- /dev/null
+++ b/src/test/test-crs-static3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 64, "memory": 200000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static3/log.expect b/src/test/test-crs-static3/log.expect
new file mode 100644
index 0000000..00cfefb
--- /dev/null
+++ b/src/test/test-crs-static3/log.expect
@@ -0,0 +1,131 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:100' on node 'node1'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node1'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: adding new service 'vm:105' on node 'node1'
+info     20    node1/crm: adding new service 'vm:106' on node 'node1'
+info     20    node1/crm: adding new service 'vm:107' on node 'node1'
+info     20    node1/crm: adding new service 'vm:108' on node 'node1'
+info     20    node1/crm: adding new service 'vm:109' on node 'node1'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     21    node1/lrm: starting service vm:105
+info     21    node1/lrm: service status vm:105 started
+info     21    node1/lrm: starting service vm:106
+info     21    node1/lrm: service status vm:106 started
+info     21    node1/lrm: starting service vm:107
+info     21    node1/lrm: service status vm:107 started
+info     21    node1/lrm: starting service vm:108
+info     21    node1/lrm: service status vm:108 started
+info     21    node1/lrm: starting service vm:109
+info     21    node1/lrm: service status vm:109 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'vm:100': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute shutdown node1
+info    120    node1/lrm: got shutdown request with shutdown policy 'migrate'
+info    120    node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info    120    node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info    120    node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info    120    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:102' to node 'node2' (running)
+info    120    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info    120    node1/crm: service 'vm:103': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    120    node1/crm: migrate service 'vm:104' to node 'node2' (running)
+info    120    node1/crm: service 'vm:104': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:105' to node 'node2' (running)
+info    120    node1/crm: service 'vm:105': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:106' to node 'node3' (running)
+info    120    node1/crm: service 'vm:106': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    120    node1/crm: migrate service 'vm:107' to node 'node2' (running)
+info    120    node1/crm: service 'vm:107': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:108' to node 'node2' (running)
+info    120    node1/crm: service 'vm:108': state changed from 'started' to 'migrate'  (node = node1, target = node2)
+info    120    node1/crm: migrate service 'vm:109' to node 'node3' (running)
+info    120    node1/crm: service 'vm:109': state changed from 'started' to 'migrate'  (node = node1, target = node3)
+info    121    node1/lrm: status change active => maintenance
+info    121    node1/lrm: service vm:101 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:101 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:102 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:102 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:103 - start migrate to node 'node3'
+info    121    node1/lrm: service vm:103 - end migrate to node 'node3'
+info    121    node1/lrm: service vm:104 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:104 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:105 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:105 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:106 - start migrate to node 'node3'
+info    121    node1/lrm: service vm:106 - end migrate to node 'node3'
+info    121    node1/lrm: service vm:107 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:107 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:108 - start migrate to node 'node2'
+info    121    node1/lrm: service vm:108 - end migrate to node 'node2'
+info    121    node1/lrm: service vm:109 - start migrate to node 'node3'
+info    121    node1/lrm: service vm:109 - end migrate to node 'node3'
+info    140    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node3)
+info    140    node1/crm: service 'vm:104': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:105': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:106': state changed from 'migrate' to 'started'  (node = node3)
+info    140    node1/crm: service 'vm:107': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:108': state changed from 'migrate' to 'started'  (node = node2)
+info    140    node1/crm: service 'vm:109': state changed from 'migrate' to 'started'  (node = node3)
+info    142    node1/lrm: exit (loop end)
+info    142     shutdown: execute crm node1 stop
+info    141    node1/crm: server received shutdown request
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    143    node2/lrm: starting service vm:101
+info    143    node2/lrm: service status vm:101 started
+info    143    node2/lrm: starting service vm:102
+info    143    node2/lrm: service status vm:102 started
+info    143    node2/lrm: starting service vm:104
+info    143    node2/lrm: service status vm:104 started
+info    143    node2/lrm: starting service vm:105
+info    143    node2/lrm: service status vm:105 started
+info    143    node2/lrm: starting service vm:107
+info    143    node2/lrm: service status vm:107 started
+info    143    node2/lrm: starting service vm:108
+info    143    node2/lrm: service status vm:108 started
+info    145    node3/lrm: got lock 'ha_agent_node3_lock'
+info    145    node3/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: starting service vm:103
+info    145    node3/lrm: service status vm:103 started
+info    145    node3/lrm: starting service vm:106
+info    145    node3/lrm: service status vm:106 started
+info    145    node3/lrm: starting service vm:109
+info    145    node3/lrm: service status vm:109 started
+info    160    node1/crm: voluntary release CRM lock
+info    161    node1/crm: exit (loop end)
+info    161     shutdown: execute power node1 off
+info    161    node2/crm: got lock 'ha_manager_lock'
+info    161    node2/crm: status change slave => master
+info    161    node2/crm: using scheduler mode 'static'
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-static3/manager_status b/src/test/test-crs-static3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static3/service_config b/src/test/test-crs-static3/service_config
new file mode 100644
index 0000000..47f94d3
--- /dev/null
+++ b/src/test/test-crs-static3/service_config
@@ -0,0 +1,12 @@
+{
+    "vm:100": { "node": "node1", "state": "stopped" },
+    "vm:101": { "node": "node1", "state": "enabled" },
+    "vm:102": { "node": "node1", "state": "enabled" },
+    "vm:103": { "node": "node1", "state": "enabled" },
+    "vm:104": { "node": "node1", "state": "enabled" },
+    "vm:105": { "node": "node1", "state": "enabled" },
+    "vm:106": { "node": "node1", "state": "enabled" },
+    "vm:107": { "node": "node1", "state": "enabled" },
+    "vm:108": { "node": "node1", "state": "enabled" },
+    "vm:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static3/static_service_stats b/src/test/test-crs-static3/static_service_stats
new file mode 100644
index 0000000..bca71cb
--- /dev/null
+++ b/src/test/test-crs-static3/static_service_stats
@@ -0,0 +1,12 @@
+{
+    "vm:100": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:101": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:102": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:103": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:104": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:105": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:106": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:107": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:108": { "maxcpu": 2, "maxmem": 4000000000 },
+    "vm:109": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static4/README b/src/test/test-crs-static4/README
new file mode 100644
index 0000000..4dfb1bc
--- /dev/null
+++ b/src/test/test-crs-static4/README
@@ -0,0 +1,6 @@
+Test how shutdown migrate policy works with the 'static' resource scheduling
+mode.
+
+Expect that, when node1 is shut down, the first service is migrated to node2 and
+all others to node 3, because the first service is very resource-heavy compared
+to the others.
diff --git a/src/test/test-crs-static4/cmdlist b/src/test/test-crs-static4/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static4/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static4/datacenter.cfg b/src/test/test-crs-static4/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "static"
+    },
+    "ha": {
+        "shutdown_policy": "migrate"
+    }
+}
+
diff --git a/src/test/test-crs-static4/hardware_status b/src/test/test-crs-static4/hardware_status
new file mode 100644
index 0000000..a83a2dc
--- /dev/null
+++ b/src/test/test-crs-static4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static4/log.expect b/src/test/test-crs-static4/log.expect
new file mode 100644
index 0000000..3eedc23
--- /dev/null
+++ b/src/test/test-crs-static4/log.expect
@@ -0,0 +1,149 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'ct:100' on node 'node1'
+info     20    node1/crm: adding new service 'ct:101' on node 'node1'
+info     20    node1/crm: adding new service 'ct:102' on node 'node1'
+info     20    node1/crm: adding new service 'ct:103' on node 'node1'
+info     20    node1/crm: adding new service 'ct:104' on node 'node1'
+info     20    node1/crm: adding new service 'ct:105' on node 'node1'
+info     20    node1/crm: adding new service 'ct:106' on node 'node1'
+info     20    node1/crm: adding new service 'ct:107' on node 'node1'
+info     20    node1/crm: adding new service 'ct:108' on node 'node1'
+info     20    node1/crm: adding new service 'ct:109' on node 'node1'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service ct:101
+info     21    node1/lrm: service status ct:101 started
+info     21    node1/lrm: starting service ct:102
+info     21    node1/lrm: service status ct:102 started
+info     21    node1/lrm: starting service ct:103
+info     21    node1/lrm: service status ct:103 started
+info     21    node1/lrm: starting service ct:104
+info     21    node1/lrm: service status ct:104 started
+info     21    node1/lrm: starting service ct:105
+info     21    node1/lrm: service status ct:105 started
+info     21    node1/lrm: starting service ct:106
+info     21    node1/lrm: service status ct:106 started
+info     21    node1/lrm: starting service ct:107
+info     21    node1/lrm: service status ct:107 started
+info     21    node1/lrm: starting service ct:108
+info     21    node1/lrm: service status ct:108 started
+info     21    node1/lrm: starting service ct:109
+info     21    node1/lrm: service status ct:109 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'ct:100': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute shutdown node1
+info    120    node1/lrm: got shutdown request with shutdown policy 'migrate'
+info    120    node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info    120    node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info    120    node1/crm: relocate service 'ct:101' to node 'node2'
+info    120    node1/crm: service 'ct:101': state changed from 'started' to 'relocate'  (node = node1, target = node2)
+info    120    node1/crm: relocate service 'ct:102' to node 'node3'
+info    120    node1/crm: service 'ct:102': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:103' to node 'node3'
+info    120    node1/crm: service 'ct:103': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:104' to node 'node3'
+info    120    node1/crm: service 'ct:104': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:105' to node 'node3'
+info    120    node1/crm: service 'ct:105': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:106' to node 'node3'
+info    120    node1/crm: service 'ct:106': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:107' to node 'node3'
+info    120    node1/crm: service 'ct:107': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:108' to node 'node3'
+info    120    node1/crm: service 'ct:108': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:109' to node 'node3'
+info    120    node1/crm: service 'ct:109': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    121    node1/lrm: status change active => maintenance
+info    121    node1/lrm: service ct:101 - start relocate to node 'node2'
+info    121    node1/lrm: stopping service ct:101 (relocate)
+info    121    node1/lrm: service status ct:101 stopped
+info    121    node1/lrm: service ct:101 - end relocate to node 'node2'
+info    121    node1/lrm: service ct:102 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:102 (relocate)
+info    121    node1/lrm: service status ct:102 stopped
+info    121    node1/lrm: service ct:102 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:103 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:103 (relocate)
+info    121    node1/lrm: service status ct:103 stopped
+info    121    node1/lrm: service ct:103 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:104 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:104 (relocate)
+info    121    node1/lrm: service status ct:104 stopped
+info    121    node1/lrm: service ct:104 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:105 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:105 (relocate)
+info    121    node1/lrm: service status ct:105 stopped
+info    121    node1/lrm: service ct:105 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:106 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:106 (relocate)
+info    121    node1/lrm: service status ct:106 stopped
+info    121    node1/lrm: service ct:106 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:107 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:107 (relocate)
+info    121    node1/lrm: service status ct:107 stopped
+info    121    node1/lrm: service ct:107 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:108 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:108 (relocate)
+info    121    node1/lrm: service status ct:108 stopped
+info    121    node1/lrm: service ct:108 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:109 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:109 (relocate)
+info    121    node1/lrm: service status ct:109 stopped
+info    121    node1/lrm: service ct:109 - end relocate to node 'node3'
+info    140    node1/crm: service 'ct:101': state changed from 'relocate' to 'started'  (node = node2)
+info    140    node1/crm: service 'ct:102': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:103': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:104': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:105': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:106': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:107': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:108': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:109': state changed from 'relocate' to 'started'  (node = node3)
+info    142    node1/lrm: exit (loop end)
+info    142     shutdown: execute crm node1 stop
+info    141    node1/crm: server received shutdown request
+info    143    node2/lrm: got lock 'ha_agent_node2_lock'
+info    143    node2/lrm: status change wait_for_agent_lock => active
+info    143    node2/lrm: starting service ct:101
+info    143    node2/lrm: service status ct:101 started
+info    145    node3/lrm: got lock 'ha_agent_node3_lock'
+info    145    node3/lrm: status change wait_for_agent_lock => active
+info    145    node3/lrm: starting service ct:102
+info    145    node3/lrm: service status ct:102 started
+info    145    node3/lrm: starting service ct:103
+info    145    node3/lrm: service status ct:103 started
+info    145    node3/lrm: starting service ct:104
+info    145    node3/lrm: service status ct:104 started
+info    145    node3/lrm: starting service ct:105
+info    145    node3/lrm: service status ct:105 started
+info    145    node3/lrm: starting service ct:106
+info    145    node3/lrm: service status ct:106 started
+info    145    node3/lrm: starting service ct:107
+info    145    node3/lrm: service status ct:107 started
+info    145    node3/lrm: starting service ct:108
+info    145    node3/lrm: service status ct:108 started
+info    145    node3/lrm: starting service ct:109
+info    145    node3/lrm: service status ct:109 started
+info    160    node1/crm: voluntary release CRM lock
+info    161    node1/crm: exit (loop end)
+info    161     shutdown: execute power node1 off
+info    161    node2/crm: got lock 'ha_manager_lock'
+info    161    node2/crm: status change slave => master
+info    161    node2/crm: using scheduler mode 'static'
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-static4/manager_status b/src/test/test-crs-static4/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static4/service_config b/src/test/test-crs-static4/service_config
new file mode 100644
index 0000000..b984a09
--- /dev/null
+++ b/src/test/test-crs-static4/service_config
@@ -0,0 +1,12 @@
+{
+    "ct:100": { "node": "node1", "state": "stopped" },
+    "ct:101": { "node": "node1", "state": "enabled" },
+    "ct:102": { "node": "node1", "state": "enabled" },
+    "ct:103": { "node": "node1", "state": "enabled" },
+    "ct:104": { "node": "node1", "state": "enabled" },
+    "ct:105": { "node": "node1", "state": "enabled" },
+    "ct:106": { "node": "node1", "state": "enabled" },
+    "ct:107": { "node": "node1", "state": "enabled" },
+    "ct:108": { "node": "node1", "state": "enabled" },
+    "ct:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static4/static_service_stats b/src/test/test-crs-static4/static_service_stats
new file mode 100644
index 0000000..878709b
--- /dev/null
+++ b/src/test/test-crs-static4/static_service_stats
@@ -0,0 +1,12 @@
+{
+    "ct:100": { "maxcpu": 2, "maxmem": 4000000000 },
+    "ct:101": { "maxcpu": 0, "maxmem": 40000000000 },
+    "ct:102": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:103": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:104": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:105": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:106": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:107": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:108": { "maxcpu": 2, "maxmem": 2000000000 },
+    "ct:109": { "maxcpu": 2, "maxmem": 2000000000 }
+}
diff --git a/src/test/test-crs-static5/README b/src/test/test-crs-static5/README
new file mode 100644
index 0000000..d9b5dc7
--- /dev/null
+++ b/src/test/test-crs-static5/README
@@ -0,0 +1,5 @@
+Test how recovery works with the 'static' resource scheduling mode.
+
+Expect that, when node1 is shut down, all services are migrated to node 3,
+because the services don't have much memory, node 2 and 3 both already have a
+service with high memory, but node 3 has much left-over CPU.
diff --git a/src/test/test-crs-static5/cmdlist b/src/test/test-crs-static5/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static5/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static5/datacenter.cfg b/src/test/test-crs-static5/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static5/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+    "crs": {
+        "ha": "static"
+    },
+    "ha": {
+        "shutdown_policy": "migrate"
+    }
+}
+
diff --git a/src/test/test-crs-static5/hardware_status b/src/test/test-crs-static5/hardware_status
new file mode 100644
index 0000000..3eb9e73
--- /dev/null
+++ b/src/test/test-crs-static5/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 128, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static5/log.expect b/src/test/test-crs-static5/log.expect
new file mode 100644
index 0000000..cb6b0d5
--- /dev/null
+++ b/src/test/test-crs-static5/log.expect
@@ -0,0 +1,117 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: using scheduler mode 'static'
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'ct:102' on node 'node2'
+info     20    node1/crm: adding new service 'ct:103' on node 'node3'
+info     20    node1/crm: adding new service 'ct:104' on node 'node1'
+info     20    node1/crm: adding new service 'ct:105' on node 'node1'
+info     20    node1/crm: adding new service 'ct:106' on node 'node1'
+info     20    node1/crm: adding new service 'ct:107' on node 'node1'
+info     20    node1/crm: adding new service 'ct:108' on node 'node1'
+info     20    node1/crm: adding new service 'ct:109' on node 'node1'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service ct:104
+info     21    node1/lrm: service status ct:104 started
+info     21    node1/lrm: starting service ct:105
+info     21    node1/lrm: service status ct:105 started
+info     21    node1/lrm: starting service ct:106
+info     21    node1/lrm: service status ct:106 started
+info     21    node1/lrm: starting service ct:107
+info     21    node1/lrm: service status ct:107 started
+info     21    node1/lrm: starting service ct:108
+info     21    node1/lrm: service status ct:108 started
+info     21    node1/lrm: starting service ct:109
+info     21    node1/lrm: service status ct:109 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service ct:102
+info     23    node2/lrm: service status ct:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service ct:103
+info     25    node3/lrm: service status ct:103 started
+info    120      cmdlist: execute shutdown node1
+info    120    node1/lrm: got shutdown request with shutdown policy 'migrate'
+info    120    node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info    120    node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info    120    node1/crm: relocate service 'ct:104' to node 'node3'
+info    120    node1/crm: service 'ct:104': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:105' to node 'node3'
+info    120    node1/crm: service 'ct:105': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:106' to node 'node3'
+info    120    node1/crm: service 'ct:106': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:107' to node 'node3'
+info    120    node1/crm: service 'ct:107': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:108' to node 'node3'
+info    120    node1/crm: service 'ct:108': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    120    node1/crm: relocate service 'ct:109' to node 'node3'
+info    120    node1/crm: service 'ct:109': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    121    node1/lrm: status change active => maintenance
+info    121    node1/lrm: service ct:104 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:104 (relocate)
+info    121    node1/lrm: service status ct:104 stopped
+info    121    node1/lrm: service ct:104 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:105 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:105 (relocate)
+info    121    node1/lrm: service status ct:105 stopped
+info    121    node1/lrm: service ct:105 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:106 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:106 (relocate)
+info    121    node1/lrm: service status ct:106 stopped
+info    121    node1/lrm: service ct:106 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:107 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:107 (relocate)
+info    121    node1/lrm: service status ct:107 stopped
+info    121    node1/lrm: service ct:107 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:108 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:108 (relocate)
+info    121    node1/lrm: service status ct:108 stopped
+info    121    node1/lrm: service ct:108 - end relocate to node 'node3'
+info    121    node1/lrm: service ct:109 - start relocate to node 'node3'
+info    121    node1/lrm: stopping service ct:109 (relocate)
+info    121    node1/lrm: service status ct:109 stopped
+info    121    node1/lrm: service ct:109 - end relocate to node 'node3'
+info    140    node1/crm: service 'ct:104': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:105': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:106': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:107': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:108': state changed from 'relocate' to 'started'  (node = node3)
+info    140    node1/crm: service 'ct:109': state changed from 'relocate' to 'started'  (node = node3)
+info    142    node1/lrm: exit (loop end)
+info    142     shutdown: execute crm node1 stop
+info    141    node1/crm: server received shutdown request
+info    145    node3/lrm: starting service ct:104
+info    145    node3/lrm: service status ct:104 started
+info    145    node3/lrm: starting service ct:105
+info    145    node3/lrm: service status ct:105 started
+info    145    node3/lrm: starting service ct:106
+info    145    node3/lrm: service status ct:106 started
+info    145    node3/lrm: starting service ct:107
+info    145    node3/lrm: service status ct:107 started
+info    145    node3/lrm: starting service ct:108
+info    145    node3/lrm: service status ct:108 started
+info    145    node3/lrm: starting service ct:109
+info    145    node3/lrm: service status ct:109 started
+info    160    node1/crm: voluntary release CRM lock
+info    161    node1/crm: exit (loop end)
+info    161     shutdown: execute power node1 off
+info    161    node2/crm: got lock 'ha_manager_lock'
+info    161    node2/crm: status change slave => master
+info    161    node2/crm: using scheduler mode 'static'
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-crs-static5/manager_status b/src/test/test-crs-static5/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static5/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static5/service_config b/src/test/test-crs-static5/service_config
new file mode 100644
index 0000000..43c5f60
--- /dev/null
+++ b/src/test/test-crs-static5/service_config
@@ -0,0 +1,10 @@
+{
+    "ct:102": { "node": "node2", "state": "enabled" },
+    "ct:103": { "node": "node3", "state": "enabled" },
+    "ct:104": { "node": "node1", "state": "enabled" },
+    "ct:105": { "node": "node1", "state": "enabled" },
+    "ct:106": { "node": "node1", "state": "enabled" },
+    "ct:107": { "node": "node1", "state": "enabled" },
+    "ct:108": { "node": "node1", "state": "enabled" },
+    "ct:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static5/static_service_stats b/src/test/test-crs-static5/static_service_stats
new file mode 100644
index 0000000..6293f63
--- /dev/null
+++ b/src/test/test-crs-static5/static_service_stats
@@ -0,0 +1,11 @@
+{
+    "ct:101": { "maxcpu": 0,   "maxmem": 40000000000 },
+    "ct:102": { "maxcpu": 0.5, "maxmem": 40000000000 },
+    "ct:103": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:104": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:105": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:106": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:107": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:108": { "maxcpu": 0.5, "maxmem": 200000000 },
+    "ct:109": { "maxcpu": 0.5, "maxmem": 200000000 }
+}
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (13 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-18  7:48   ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 src/PVE/HA/Resources/PVECT.pm | 2 ++
 src/PVE/HA/Resources/PVEVM.pm | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index 4c9530d..e77d98c 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -3,6 +3,8 @@ package PVE::HA::Resources::PVECT;
 use strict;
 use warnings;
 
+use PVE::Cluster;
+
 use PVE::HA::Tools;
 
 BEGIN {
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index 49e4a1d..f405d86 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -3,6 +3,8 @@ package PVE::HA::Resources::PVEVM;
 use strict;
 use warnings;
 
+use PVE::Cluster;
+
 use PVE::HA::Tools;
 
 BEGIN {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (14 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
  2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

briefly describing the 'basic' and 'static' modes and with a note
mentioning plans for balancers.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Mention that it also affects shutdown policy migrations.
    * Describe static mode in more detail.

 ha-manager.adoc | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/ha-manager.adoc b/ha-manager.adoc
index 54db2a5..038193f 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -933,6 +933,51 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or
 immediate node reboot or even reset.
 
 
+Scheduler Mode
+--------------
+
+The scheduler mode controls how HA selects nodes for the recovery of a service
+as well as for migrations that are triggered by a shutdown policy. The default
+mode is `basic`, you can change it in `datacenter.cfg`:
+
+----
+crs: ha=static
+----
+
+The change will be in effect when a new master takes over. This can be triggered
+by executing the following on the current master's node:
+
+----
+systemctl reload-or-restart pve-ha-crm.service
+----
+
+For each service that needs to be recovered or migrated, the scheduler
+iteratively chooses the best node among the nodes with the highest priority in
+the service's group.
+
+NOTE: There are plans to add modes for (static and dynamic) load-balancing in
+the future.
+
+Basic
+^^^^^
+
+The number of active HA serivces on each node is used to choose a recovery node.
+
+Static
+^^^^^^
+
+Static usage information from HA serivces on each node is used to choose a
+recovery node.
+
+For this selection, each node in turn is considered as if the service was
+already running on it, using CPU and memory usage from the associated guest
+configuration. Then for each such alternative, CPU and memory usage of all nodes
+are considered, with memory being weighted much more, because it's a truly
+limited resource. For both, CPU and memory, highest usage among nodes (weighted
+more, as ideally no node should be overcommitted) and average usage of all nodes
+(to still be able to distinguish in case there already is a more highly
+committed node) are considered.
+
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]
 endif::manvolnum[]
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (15 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
  2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht
  17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
  To: pve-devel

In HA manager, the function recompute_online_node_usage() is called
very often currently and the 'static' mode needs to read the guest
configs which adds a bit of overhead.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 ha-manager.adoc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ha-manager.adoc b/ha-manager.adoc
index 038193f..710cbca 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -966,6 +966,9 @@ The number of active HA serivces on each node is used to choose a recovery node.
 Static
 ^^^^^^
 
+WARNING: The static mode is still a technology preview. It is not recommended to
+use it if you have thousands of HA managed services.
+
 Static usage information from HA serivces on each node is used to choose a
 recovery node.
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
@ 2022-11-18  7:48   ` Fiona Ebner
  2022-11-18 12:48     ` Thomas Lamprecht
  0 siblings, 1 reply; 21+ messages in thread
From: Fiona Ebner @ 2022-11-18  7:48 UTC (permalink / raw)
  To: pve-devel

Am 17.11.22 um 15:00 schrieb Fiona Ebner:
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> New in v2.
> 
>  src/PVE/HA/Resources/PVECT.pm | 2 ++
>  src/PVE/HA/Resources/PVEVM.pm | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
> index 4c9530d..e77d98c 100644
> --- a/src/PVE/HA/Resources/PVECT.pm
> +++ b/src/PVE/HA/Resources/PVECT.pm
> @@ -3,6 +3,8 @@ package PVE::HA::Resources::PVECT;
>  use strict;
>  use warnings;
>  
> +use PVE::Cluster;
> +
>  use PVE::HA::Tools;
>  
>  BEGIN {

Might be better added to the BEGIN block here, and not pull it in for
doc generation in the spirit of a1c8862 ("buildsys: don't pull qemu/lxc
during doc-generation")

> diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
> index 49e4a1d..f405d86 100644
> --- a/src/PVE/HA/Resources/PVEVM.pm
> +++ b/src/PVE/HA/Resources/PVEVM.pm
> @@ -3,6 +3,8 @@ package PVE::HA::Resources::PVEVM;
>  use strict;
>  use warnings;
>  
> +use PVE::Cluster;
> +
>  use PVE::HA::Tools;
>  
>  BEGIN {




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
  2022-11-18  7:48   ` Fiona Ebner
@ 2022-11-18 12:48     ` Thomas Lamprecht
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Lamprecht @ 2022-11-18 12:48 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 18/11/2022 um 08:48 schrieb Fiona Ebner:
>> +use PVE::Cluster;
>> +
>>  use PVE::HA::Tools;
>>  
>>  BEGIN {
> Might be better added to the BEGIN block here, and not pull it in for
> doc generation in the spirit of a1c8862 ("buildsys: don't pull qemu/lxc
> during doc-generation")
> 

Not relevant, we only do that for pve-container & qemu-server dependencies,
as those have a cyclic dependency with pve-ha-manager; so only for those we
guard to make bootstrapping easier.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
  2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
                   ` (16 preceding siblings ...)
  2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
@ 2022-11-18 13:23 ` Thomas Lamprecht
  17 siblings, 0 replies; 21+ messages in thread
From: Thomas Lamprecht @ 2022-11-18 13:23 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner

Am 17/11/2022 um 15:00 schrieb Fiona Ebner:
> ha-manager:
> 
> Fiona Ebner (15):
>   env: add get_static_node_stats() method
>   resources: add get_static_stats() method
>   add Usage base plugin and Usage::Basic plugin
>   manager: select service node: add $sid to parameters
>   manager: online node usage: switch to Usage::Basic plugin
>   usage: add Usage::Static plugin
>   env: rename get_ha_settings to get_datacenter_settings
>   env: datacenter config: include crs (cluster-resource-scheduling)
>     setting
>   manager: set resource scheduler mode upon init
>   manager: use static resource scheduler when configured
>   manager: avoid scoring nodes if maintenance fallback node is valid
>   manager: avoid scoring nodes when not trying next and current node is
>     valid
>   usage: static: use service count on nodes as a fallback
>   test: add tests for static resource scheduling
>   resources: add missing PVE::Cluster use statements
> 

> 
> docs:
> 
> Fiona Ebner (2):
>   ha: add section about scheduler modes
>   ha: add warning against using 'static' mode with many services
> 
>  ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 


nice work! applied series, thanks!




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-11-18 13:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
2022-11-18  7:48   ` Fiona Ebner
2022-11-18 12:48     ` Thomas Lamprecht
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal