* [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
@ 2022-11-17 14:00 Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
` (17 more replies)
0 siblings, 18 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.
With this version, the effect is limited to choosing nodes during
recovery or by migrations triggered by a shutdown plolicy, but the
plan is to extend this in the future.
As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.
An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
chosen node after each iteration. The current node can be kept if the
score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.
The online node usage calculation is factored out into a 'Usage'
plugin system to ease adding the new static mode without much
cluttering. If not all nodes provide static service information, we
fall back to the 'basic' mode. If only the scoring fails, the service
count is used as a fallback.
Dependency bumps needed:
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.
Changes from v1:
* Drop already applied patches.
* Add tests for HA manager which also required properly adding
relevant methods to the simulation environment.
* Implement fallback for scoring in Usage/Static.pm.
* Improve documentation and mention current limitation with many
services.
ha-manager:
Fiona Ebner (15):
env: add get_static_node_stats() method
resources: add get_static_stats() method
add Usage base plugin and Usage::Basic plugin
manager: select service node: add $sid to parameters
manager: online node usage: switch to Usage::Basic plugin
usage: add Usage::Static plugin
env: rename get_ha_settings to get_datacenter_settings
env: datacenter config: include crs (cluster-resource-scheduling)
setting
manager: set resource scheduler mode upon init
manager: use static resource scheduler when configured
manager: avoid scoring nodes if maintenance fallback node is valid
manager: avoid scoring nodes when not trying next and current node is
valid
usage: static: use service count on nodes as a fallback
test: add tests for static resource scheduling
resources: add missing PVE::Cluster use statements
debian/pve-ha-manager.install | 3 +
src/PVE/HA/Env.pm | 10 +-
src/PVE/HA/Env/PVE2.pm | 27 ++-
src/PVE/HA/LRM.pm | 4 +-
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Manager.pm | 79 +++++---
src/PVE/HA/Resources.pm | 5 +
src/PVE/HA/Resources/PVECT.pm | 13 ++
src/PVE/HA/Resources/PVEVM.pm | 16 ++
src/PVE/HA/Sim/Env.pm | 13 +-
src/PVE/HA/Sim/Hardware.pm | 28 +++
src/PVE/HA/Sim/Resources.pm | 10 +
src/PVE/HA/Usage.pm | 50 +++++
src/PVE/HA/Usage/Basic.pm | 52 ++++++
src/PVE/HA/Usage/Makefile | 6 +
src/PVE/HA/Usage/Static.pm | 120 ++++++++++++
src/test/test-crs-static1/README | 4 +
src/test/test-crs-static1/cmdlist | 4 +
src/test/test-crs-static1/datacenter.cfg | 6 +
src/test/test-crs-static1/hardware_status | 5 +
src/test/test-crs-static1/log.expect | 50 +++++
src/test/test-crs-static1/manager_status | 1 +
src/test/test-crs-static1/service_config | 3 +
.../test-crs-static1/static_service_stats | 3 +
src/test/test-crs-static2/README | 4 +
src/test/test-crs-static2/cmdlist | 20 ++
src/test/test-crs-static2/datacenter.cfg | 6 +
src/test/test-crs-static2/groups | 2 +
src/test/test-crs-static2/hardware_status | 7 +
src/test/test-crs-static2/log.expect | 171 ++++++++++++++++++
src/test/test-crs-static2/manager_status | 1 +
src/test/test-crs-static2/service_config | 3 +
.../test-crs-static2/static_service_stats | 3 +
src/test/test-crs-static3/README | 5 +
src/test/test-crs-static3/cmdlist | 4 +
src/test/test-crs-static3/datacenter.cfg | 9 +
src/test/test-crs-static3/hardware_status | 5 +
src/test/test-crs-static3/log.expect | 131 ++++++++++++++
src/test/test-crs-static3/manager_status | 1 +
src/test/test-crs-static3/service_config | 12 ++
.../test-crs-static3/static_service_stats | 12 ++
src/test/test-crs-static4/README | 6 +
src/test/test-crs-static4/cmdlist | 4 +
src/test/test-crs-static4/datacenter.cfg | 9 +
src/test/test-crs-static4/hardware_status | 5 +
src/test/test-crs-static4/log.expect | 149 +++++++++++++++
src/test/test-crs-static4/manager_status | 1 +
src/test/test-crs-static4/service_config | 12 ++
.../test-crs-static4/static_service_stats | 12 ++
src/test/test-crs-static5/README | 5 +
src/test/test-crs-static5/cmdlist | 4 +
src/test/test-crs-static5/datacenter.cfg | 9 +
src/test/test-crs-static5/hardware_status | 5 +
src/test/test-crs-static5/log.expect | 117 ++++++++++++
src/test/test-crs-static5/manager_status | 1 +
src/test/test-crs-static5/service_config | 10 +
.../test-crs-static5/static_service_stats | 11 ++
src/test/test_failover1.pl | 21 ++-
58 files changed, 1242 insertions(+), 50 deletions(-)
create mode 100644 src/PVE/HA/Usage.pm
create mode 100644 src/PVE/HA/Usage/Basic.pm
create mode 100644 src/PVE/HA/Usage/Makefile
create mode 100644 src/PVE/HA/Usage/Static.pm
create mode 100644 src/test/test-crs-static1/README
create mode 100644 src/test/test-crs-static1/cmdlist
create mode 100644 src/test/test-crs-static1/datacenter.cfg
create mode 100644 src/test/test-crs-static1/hardware_status
create mode 100644 src/test/test-crs-static1/log.expect
create mode 100644 src/test/test-crs-static1/manager_status
create mode 100644 src/test/test-crs-static1/service_config
create mode 100644 src/test/test-crs-static1/static_service_stats
create mode 100644 src/test/test-crs-static2/README
create mode 100644 src/test/test-crs-static2/cmdlist
create mode 100644 src/test/test-crs-static2/datacenter.cfg
create mode 100644 src/test/test-crs-static2/groups
create mode 100644 src/test/test-crs-static2/hardware_status
create mode 100644 src/test/test-crs-static2/log.expect
create mode 100644 src/test/test-crs-static2/manager_status
create mode 100644 src/test/test-crs-static2/service_config
create mode 100644 src/test/test-crs-static2/static_service_stats
create mode 100644 src/test/test-crs-static3/README
create mode 100644 src/test/test-crs-static3/cmdlist
create mode 100644 src/test/test-crs-static3/datacenter.cfg
create mode 100644 src/test/test-crs-static3/hardware_status
create mode 100644 src/test/test-crs-static3/log.expect
create mode 100644 src/test/test-crs-static3/manager_status
create mode 100644 src/test/test-crs-static3/service_config
create mode 100644 src/test/test-crs-static3/static_service_stats
create mode 100644 src/test/test-crs-static4/README
create mode 100644 src/test/test-crs-static4/cmdlist
create mode 100644 src/test/test-crs-static4/datacenter.cfg
create mode 100644 src/test/test-crs-static4/hardware_status
create mode 100644 src/test/test-crs-static4/log.expect
create mode 100644 src/test/test-crs-static4/manager_status
create mode 100644 src/test/test-crs-static4/service_config
create mode 100644 src/test/test-crs-static4/static_service_stats
create mode 100644 src/test/test-crs-static5/README
create mode 100644 src/test/test-crs-static5/cmdlist
create mode 100644 src/test/test-crs-static5/datacenter.cfg
create mode 100644 src/test/test-crs-static5/hardware_status
create mode 100644 src/test/test-crs-static5/log.expect
create mode 100644 src/test/test-crs-static5/manager_status
create mode 100644 src/test/test-crs-static5/service_config
create mode 100644 src/test/test-crs-static5/static_service_stats
docs:
Fiona Ebner (2):
ha: add section about scheduler modes
ha: add warning against using 'static' mode with many services
ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
` (16 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
to be used for static resource scheduling. In the simulation
environment, the information can be added in hardware_status.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Properly add it to the simulation environment.
src/PVE/HA/Env.pm | 6 ++++++
src/PVE/HA/Env/PVE2.pm | 13 +++++++++++++
src/PVE/HA/Sim/Env.pm | 6 ++++++
src/PVE/HA/Sim/Hardware.pm | 13 +++++++++++++
4 files changed, 38 insertions(+)
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index ac569a9..00e3e3c 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -269,4 +269,10 @@ sub get_ha_settings {
return $self->{plug}->get_ha_settings();
}
+sub get_static_node_stats {
+ my ($self) = @_;
+
+ return $self->{plug}->get_static_node_stats();
+}
+
1;
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 5e0a683..7cecf35 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -5,6 +5,7 @@ use warnings;
use POSIX qw(:errno_h :fcntl_h);
use IO::File;
use IO::Socket::UNIX;
+use JSON;
use PVE::SafeSyslog;
use PVE::Tools;
@@ -459,4 +460,16 @@ sub get_ha_settings {
return $datacenterconfig->{ha};
}
+sub get_static_node_stats {
+ my ($self) = @_;
+
+ my $stats = PVE::Cluster::get_node_kv('static-info');
+ for my $node (keys $stats->%*) {
+ $stats->{$node} = eval { decode_json($stats->{$node}) };
+ $self->log('err', "unable to decode static node info for '$node' - $@") if $@;
+ }
+
+ return $stats;
+}
+
1;
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index b286708..6bd35b3 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -433,4 +433,10 @@ sub get_ha_settings {
return $datacenterconfig->{ha};
}
+sub get_static_node_stats {
+ my ($self) = @_;
+
+ return $self->{hardware}->get_static_node_stats();
+}
+
1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index 96a4064..e38561a 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -858,4 +858,17 @@ sub watchdog_update {
return &$modify_watchog($self, $code);
}
+sub get_static_node_stats {
+ my ($self) = @_;
+
+ my $cstatus = $self->read_hardware_status_nolock();
+
+ my $stats = {};
+ for my $node (keys $cstatus->%*) {
+ $stats->{$node} = { $cstatus->{$node}->%{qw(cpus memory)} };
+ }
+
+ return $stats;
+}
+
1;
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
` (15 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
to be used for static resource scheduling.
In container's vmstatus(), the 'cores' option takes precedence over
the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit'
here.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Properly add it to the simulation environment.
src/PVE/HA/Resources.pm | 5 +++++
src/PVE/HA/Resources/PVECT.pm | 11 +++++++++++
src/PVE/HA/Resources/PVEVM.pm | 14 ++++++++++++++
src/PVE/HA/Sim/Hardware.pm | 15 +++++++++++++++
src/PVE/HA/Sim/Resources.pm | 10 ++++++++++
5 files changed, 55 insertions(+)
diff --git a/src/PVE/HA/Resources.pm b/src/PVE/HA/Resources.pm
index 835c314..7ba90f6 100644
--- a/src/PVE/HA/Resources.pm
+++ b/src/PVE/HA/Resources.pm
@@ -161,6 +161,11 @@ sub remove_locks {
die "implement in subclass";
}
+sub get_static_stats {
+ my ($class, $haenv, $id, $service_node) = @_;
+
+ die "implement in subclass";
+}
# package PVE::HA::Resources::IPAddr;
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index 015faf3..4c9530d 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -150,4 +150,15 @@ sub remove_locks {
return undef;
}
+sub get_static_stats {
+ my ($class, $haenv, $id, $service_node) = @_;
+
+ my $conf = PVE::LXC::Config->load_config($id, $service_node);
+
+ return {
+ maxcpu => $conf->{cpulimit} || $conf->{cores} || 0,
+ maxmem => ($conf->{memory} || 512) * 1024 * 1024,
+ };
+}
+
1;
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index 58c83e0..49e4a1d 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -173,4 +173,18 @@ sub remove_locks {
return undef;
}
+sub get_static_stats {
+ my ($class, $haenv, $id, $service_node) = @_;
+
+ my $conf = PVE::QemuConfig->load_config($id, $service_node);
+ my $defaults = PVE::QemuServer::load_defaults();
+
+ my $cpus = ($conf->{sockets} || $defaults->{sockets}) * ($conf->{cores} || $defaults->{cores});
+
+ return {
+ maxcpu => $conf->{vcpus} || $cpus,
+ maxmem => ($conf->{memory} || $defaults->{memory}) * 1024 * 1024,
+ };
+}
+
1;
diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm
index e38561a..e33a4c5 100644
--- a/src/PVE/HA/Sim/Hardware.pm
+++ b/src/PVE/HA/Sim/Hardware.pm
@@ -29,6 +29,7 @@ my $watchdog_timeout = 60;
# $testdir/hardware_status Hardware description (number of nodes, ...)
# $testdir/manager_status CRM status (start with {})
# $testdir/service_config Service configuration
+# $testdir/static_service_stats Static service usage information (cpu, memory)
# $testdir/groups HA groups configuration
# $testdir/service_status_<node> Service status
# $testdir/datacenter.cfg Datacenter wide HA configuration
@@ -38,6 +39,7 @@ my $watchdog_timeout = 60;
#
# $testdir/status/cluster_locks Cluster locks
# $testdir/status/hardware_status Hardware status (power/network on/off)
+# $testdir/status/static_service_stats Static service usage information (cpu, memory)
# $testdir/status/watchdog_status Watchdog status
#
# runtime status
@@ -330,6 +332,15 @@ sub write_service_status {
return $res;
}
+sub read_static_service_stats {
+ my ($self) = @_;
+
+ my $filename = "$self->{statusdir}/static_service_stats";
+ my $stats = PVE::HA::Tools::read_json_from_file($filename);
+
+ return $stats;
+}
+
my $default_group_config = <<__EOD;
group: prefer_node1
nodes node1
@@ -404,6 +415,10 @@ sub new {
copy("$testdir/datacenter.cfg", "$statusdir/datacenter.cfg");
}
+ if (-f "$testdir/static_service_stats") {
+ copy("$testdir/static_service_stats", "$statusdir/static_service_stats");
+ }
+
my $cstatus = $self->read_hardware_status_nolock();
foreach my $node (sort keys %$cstatus) {
diff --git a/src/PVE/HA/Sim/Resources.pm b/src/PVE/HA/Sim/Resources.pm
index bccc0e6..e6e1853 100644
--- a/src/PVE/HA/Sim/Resources.pm
+++ b/src/PVE/HA/Sim/Resources.pm
@@ -139,4 +139,14 @@ sub remove_locks {
return undef;
}
+sub get_static_stats {
+ my ($class, $haenv, $id, $service_node) = @_;
+
+ my $sid = $class->type() . ":$id";
+ my $hardware = $haenv->hardware();
+
+ my $stats = $hardware->read_static_service_stats();
+ return $stats->{$sid};
+}
+
1;
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
` (14 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
in preparation to also support static resource scheduling via another
such Usage plugin.
The interface is designed in anticipation of the Usage::Static plugin,
the Usage::Basic plugin doesn't require all parameters.
In Usage::Static, the $haenv will necessary for logging and getting
the static node stats. add_service_usage_to_node() and
score_nodes_to_start_service() take the sid, service node and the
former also the optional migration target (during a migration it's not
clear whether the config file has already been moved or not) to be
able to get the static service stats.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes from v1.
debian/pve-ha-manager.install | 2 ++
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Usage.pm | 49 +++++++++++++++++++++++++++++++++
src/PVE/HA/Usage/Basic.pm | 52 +++++++++++++++++++++++++++++++++++
src/PVE/HA/Usage/Makefile | 6 ++++
5 files changed, 111 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Usage.pm
create mode 100644 src/PVE/HA/Usage/Basic.pm
create mode 100644 src/PVE/HA/Usage/Makefile
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 33a5c58..87fb24c 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -33,5 +33,7 @@
/usr/share/perl5/PVE/HA/Resources/PVECT.pm
/usr/share/perl5/PVE/HA/Resources/PVEVM.pm
/usr/share/perl5/PVE/HA/Tools.pm
+/usr/share/perl5/PVE/HA/Usage.pm
+/usr/share/perl5/PVE/HA/Usage/Basic.pm
/usr/share/perl5/PVE/Service/pve_ha_crm.pm
/usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Makefile b/src/PVE/HA/Makefile
index c366f6c..8c91b97 100644
--- a/src/PVE/HA/Makefile
+++ b/src/PVE/HA/Makefile
@@ -1,5 +1,5 @@
SIM_SOURCES=CRM.pm Env.pm Groups.pm Resources.pm LRM.pm Manager.pm \
- NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm
+ NodeStatus.pm Tools.pm FenceConfig.pm Fence.pm Usage.pm
SOURCES=${SIM_SOURCES} Config.pm
@@ -8,6 +8,7 @@ install:
install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA
for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/$$i; done
make -C Resources install
+ make -C Usage install
make -C Env install
.PHONY: installsim
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
new file mode 100644
index 0000000..4c723d1
--- /dev/null
+++ b/src/PVE/HA/Usage.pm
@@ -0,0 +1,49 @@
+package PVE::HA::Usage;
+
+use strict;
+use warnings;
+
+sub new {
+ my ($class, $haenv) = @_;
+
+ die "implement in subclass";
+}
+
+sub add_node {
+ my ($self, $nodename) = @_;
+
+ die "implement in subclass";
+}
+
+sub remove_node {
+ my ($self, $nodename) = @_;
+
+ die "implement in subclass";
+}
+
+sub list_nodes {
+ my ($self) = @_;
+
+ die "implement in subclass";
+}
+
+sub contains_node {
+ my ($self, $nodename) = @_;
+
+ die "implement in subclass";
+}
+
+sub add_service_usage_to_node {
+ my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+ die "implement in subclass";
+}
+
+# Returns a hash with $nodename => $score pairs. A lower $score is better.
+sub score_nodes_to_start_service {
+ my ($self, $sid, $service_node) = @_;
+
+ die "implement in subclass";
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Basic.pm b/src/PVE/HA/Usage/Basic.pm
new file mode 100644
index 0000000..f066350
--- /dev/null
+++ b/src/PVE/HA/Usage/Basic.pm
@@ -0,0 +1,52 @@
+package PVE::HA::Usage::Basic;
+
+use strict;
+use warnings;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+ my ($class, $haenv) = @_;
+
+ return bless {
+ nodes => {},
+ }, $class;
+}
+
+sub add_node {
+ my ($self, $nodename) = @_;
+
+ $self->{nodes}->{$nodename} = 0;
+}
+
+sub remove_node {
+ my ($self, $nodename) = @_;
+
+ delete $self->{nodes}->{$nodename};
+}
+
+sub list_nodes {
+ my ($self) = @_;
+
+ return keys $self->{nodes}->%*;
+}
+
+sub contains_node {
+ my ($self, $nodename) = @_;
+
+ return defined($self->{nodes}->{$nodename});
+}
+
+sub add_service_usage_to_node {
+ my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+ $self->{nodes}->{$nodename}++;
+}
+
+sub score_nodes_to_start_service {
+ my ($self, $sid, $service_node) = @_;
+
+ return $self->{nodes};
+}
+
+1;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
new file mode 100644
index 0000000..ccf1282
--- /dev/null
+++ b/src/PVE/HA/Usage/Makefile
@@ -0,0 +1,6 @@
+SOURCES=Basic.pm
+
+.PHONY: install
+install:
+ install -d -m 0755 ${DESTDIR}${PERLDIR}/PVE/HA/Usage
+ for i in ${SOURCES}; do install -D -m 0644 $$i ${DESTDIR}${PERLDIR}/PVE/HA/Usage/$$i; done
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (2 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
` (13 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
In preparation for scheduling based on static information, where the
scoring of nodes depends on information from the service's
VM/CT configuration file (and the $sid is required to query that).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes from v1.
src/PVE/HA/Manager.pm | 4 +++-
src/test/test_failover1.pl | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 518f64f..63c94af 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -119,7 +119,7 @@ sub get_node_priority_groups {
}
sub select_service_node {
- my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
+ my ($groups, $online_node_usage, $sid, $service_conf, $current_node, $try_next, $tried_nodes, $maintenance_fallback) = @_;
my $group = get_service_group($groups, $online_node_usage, $service_conf);
@@ -766,6 +766,7 @@ sub next_state_started {
my $node = select_service_node(
$self->{groups},
$self->{online_node_usage},
+ $sid,
$cd,
$sd->{node},
$try_next,
@@ -847,6 +848,7 @@ sub next_state_recovery {
my $recovery_node = select_service_node(
$self->{groups},
$self->{online_node_usage},
+ $sid,
$cd,
$sd->{node},
);
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index 67573a2..f11d1a6 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -30,7 +30,7 @@ sub test {
my ($expected_node, $try_next) = @_;
my $node = PVE::HA::Manager::select_service_node
- ($groups, $online_node_usage, $service_conf, $current_node, $try_next);
+ ($groups, $online_node_usage, "vm:111", $service_conf, $current_node, $try_next);
my (undef, undef, $line) = caller();
die "unexpected result: $node != ${expected_node} at line $line\n"
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (3 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
` (12 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
no functional change is intended.
One test needs adaptation too, because it created its own version of
$online_node_usage.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes from v1.
src/PVE/HA/Manager.pm | 35 +++++++++++++++++------------------
src/test/test_failover1.pl | 19 ++++++++++---------
2 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63c94af..63e6c8a 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -7,6 +7,7 @@ use Digest::MD5 qw(md5_base64);
use PVE::Tools;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
+use PVE::HA::Usage::Basic;
## Variable Name & Abbreviations Convention
#
@@ -77,9 +78,7 @@ sub get_service_group {
my $group = {};
# add all online nodes to default group to allow try_next when no group set
- foreach my $node (keys %$online_node_usage) {
- $group->{nodes}->{$node} = 1;
- }
+ $group->{nodes}->{$_} = 1 for $online_node_usage->list_nodes();
# overwrite default if service is bound to a specific group
if (my $group_id = $service_conf->{group}) {
@@ -100,7 +99,7 @@ sub get_node_priority_groups {
if ($entry =~ m/^(\S+):(\d+)$/) {
($node, $pri) = ($1, $2);
}
- next if !defined($online_node_usage->{$node}); # offline
+ next if !$online_node_usage->contains_node($node); # offline
$pri_groups->{$pri}->{$node} = 1;
$group_members->{$node} = $pri;
}
@@ -108,7 +107,7 @@ sub get_node_priority_groups {
# add non-group members to unrestricted groups (priority -1)
if (!$group->{restricted}) {
my $pri = -1;
- foreach my $node (keys %$online_node_usage) {
+ for my $node ($online_node_usage->list_nodes()) {
next if defined($group_members->{$node});
$pri_groups->{$pri}->{$node} = 1;
$group_members->{$node} = -1;
@@ -144,8 +143,9 @@ sub select_service_node {
}
}
+ my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
- $online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b
+ $scores->{$a} <=> $scores->{$b} || $a cmp $b
} keys %{$pri_groups->{$top_pri}};
my $found;
@@ -201,39 +201,38 @@ my $valid_service_states = {
sub recompute_online_node_usage {
my ($self) = @_;
- my $online_node_usage = {};
+ my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
my $online_nodes = $self->{ns}->list_online_nodes();
- foreach my $node (@$online_nodes) {
- $online_node_usage->{$node} = 0;
- }
+ $online_node_usage->add_node($_) for $online_nodes->@*;
foreach my $sid (keys %{$self->{ss}}) {
my $sd = $self->{ss}->{$sid};
my $state = $sd->{state};
my $target = $sd->{target}; # optional
- if (defined($online_node_usage->{$sd->{node}})) {
+ if ($online_node_usage->contains_node($sd->{node})) {
if (
$state eq 'started' || $state eq 'request_stop' || $state eq 'fence' ||
$state eq 'freeze' || $state eq 'error' || $state eq 'recovery'
) {
- $online_node_usage->{$sd->{node}}++;
+ $online_node_usage->add_service_usage_to_node($sd->{node}, $sid, $sd->{node});
} elsif (($state eq 'migrate') || ($state eq 'relocate')) {
+ my $source = $sd->{node};
# count it for both, source and target as load is put on both
- $online_node_usage->{$sd->{node}}++;
- $online_node_usage->{$target}++;
+ $online_node_usage->add_service_usage_to_node($source, $sid, $source, $target);
+ $online_node_usage->add_service_usage_to_node($target, $sid, $source, $target);
} elsif ($state eq 'stopped') {
# do nothing
} else {
die "should not be reached (sid = '$sid', state = '$state')";
}
- } elsif (defined($target) && defined($online_node_usage->{$target})) {
+ } elsif (defined($target) && $online_node_usage->contains_node($target)) {
if ($state eq 'migrate' || $state eq 'relocate') {
# to correctly track maintenance modi and also consider the target as used for the
# case a node dies, as we cannot really know if the to-be-aborted incoming migration
# has already cleaned up all used resources
- $online_node_usage->{$target}++;
+ $online_node_usage->add_service_usage_to_node($target, $sid, $sd->{node}, $target);
}
}
}
@@ -775,7 +774,7 @@ sub next_state_started {
);
if ($node && ($sd->{node} ne $node)) {
- $self->{online_node_usage}->{$node}++;
+ $self->{online_node_usage}->add_service_usage_to_node($node, $sid, $sd->{node});
if (defined(my $fallback = $sd->{maintenance_node})) {
if ($node eq $fallback) {
@@ -864,7 +863,7 @@ sub next_state_recovery {
$fence_recovery_cleanup->($self, $sid, $fenced_node);
$haenv->steal_service($sid, $sd->{node}, $recovery_node);
- $self->{online_node_usage}->{$recovery_node}++;
+ $self->{online_node_usage}->add_service_usage_to_node($recovery_node, $sid, $recovery_node);
# NOTE: $sd *is normally read-only*, fencing is the exception
$cd->{node} = $sd->{node} = $recovery_node;
diff --git a/src/test/test_failover1.pl b/src/test/test_failover1.pl
index f11d1a6..308eab3 100755
--- a/src/test/test_failover1.pl
+++ b/src/test/test_failover1.pl
@@ -6,6 +6,7 @@ use warnings;
use lib '..';
use PVE::HA::Groups;
use PVE::HA::Manager;
+use PVE::HA::Usage::Basic;
my $groups = PVE::HA::Groups->parse_config("groups.tmp", <<EOD);
group: prefer_node1
@@ -13,11 +14,11 @@ group: prefer_node1
EOD
-my $online_node_usage = {
- node1 => 0,
- node2 => 0,
- node3 => 0,
-};
+# Relies on the fact that the basic plugin doesn't use the haenv.
+my $online_node_usage = PVE::HA::Usage::Basic->new();
+$online_node_usage->add_node("node1");
+$online_node_usage->add_node("node2");
+$online_node_usage->add_node("node3");
my $service_conf = {
node => 'node1',
@@ -43,22 +44,22 @@ sub test {
test('node1');
test('node1', 1);
-delete $online_node_usage->{node1}; # poweroff
+$online_node_usage->remove_node("node1"); # poweroff
test('node2');
test('node3', 1);
test('node2', 1);
-delete $online_node_usage->{node2}; # poweroff
+$online_node_usage->remove_node("node2"); # poweroff
test('node3');
test('node3', 1);
-$online_node_usage->{node1} = 0; # poweron
+$online_node_usage->add_node("node1"); # poweron
test('node1');
-$online_node_usage->{node2} = 0; # poweron
+$online_node_usage->add_node("node2"); # poweron
test('node1');
test('node1', 1);
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (4 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
` (11 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
for calculating node usage of services based upon static CPU and
memory configuration as well as scoring the nodes with that
information to decide where to start a new or recovered service.
For getting the service stats, it's necessary to also consider the
migration target (if present), becuase the configuration file might
have already moved.
It's necessary to update the cluster filesystem upon stealing the
service to be able to always read the moved config right away when
adding the usage.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Pass haenv to resource's get_static_stats(), required by
simulation env.
debian/pve-ha-manager.install | 1 +
src/PVE/HA/Env/PVE2.pm | 4 ++
src/PVE/HA/Usage.pm | 1 +
src/PVE/HA/Usage/Makefile | 2 +-
src/PVE/HA/Usage/Static.pm | 114 ++++++++++++++++++++++++++++++++++
5 files changed, 121 insertions(+), 1 deletion(-)
create mode 100644 src/PVE/HA/Usage/Static.pm
diff --git a/debian/pve-ha-manager.install b/debian/pve-ha-manager.install
index 87fb24c..a7598a9 100644
--- a/debian/pve-ha-manager.install
+++ b/debian/pve-ha-manager.install
@@ -35,5 +35,6 @@
/usr/share/perl5/PVE/HA/Tools.pm
/usr/share/perl5/PVE/HA/Usage.pm
/usr/share/perl5/PVE/HA/Usage/Basic.pm
+/usr/share/perl5/PVE/HA/Usage/Static.pm
/usr/share/perl5/PVE/Service/pve_ha_crm.pm
/usr/share/perl5/PVE/Service/pve_ha_lrm.pm
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7cecf35..7fac43c 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -176,6 +176,10 @@ sub steal_service {
} else {
die "implement me";
}
+
+ # Necessary for (at least) static usage plugin to always be able to read service config from new
+ # node right away.
+ $self->cluster_state_update();
}
sub read_group_config {
diff --git a/src/PVE/HA/Usage.pm b/src/PVE/HA/Usage.pm
index 4c723d1..66d9572 100644
--- a/src/PVE/HA/Usage.pm
+++ b/src/PVE/HA/Usage.pm
@@ -33,6 +33,7 @@ sub contains_node {
die "implement in subclass";
}
+# Logs a warning to $haenv upon failure, but does not die.
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
diff --git a/src/PVE/HA/Usage/Makefile b/src/PVE/HA/Usage/Makefile
index ccf1282..5a51359 100644
--- a/src/PVE/HA/Usage/Makefile
+++ b/src/PVE/HA/Usage/Makefile
@@ -1,4 +1,4 @@
-SOURCES=Basic.pm
+SOURCES=Basic.pm Static.pm
.PHONY: install
install:
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
new file mode 100644
index 0000000..ce705eb
--- /dev/null
+++ b/src/PVE/HA/Usage/Static.pm
@@ -0,0 +1,114 @@
+package PVE::HA::Usage::Static;
+
+use strict;
+use warnings;
+
+use PVE::HA::Resources;
+use PVE::RS::ResourceScheduling::Static;
+
+use base qw(PVE::HA::Usage);
+
+sub new {
+ my ($class, $haenv) = @_;
+
+ my $node_stats = eval { $haenv->get_static_node_stats() };
+ die "did not get static node usage information - $@" if $@;
+
+ my $scheduler = eval { PVE::RS::ResourceScheduling::Static->new(); };
+ die "unable to initialize static scheduling - $@" if $@;
+
+ return bless {
+ 'node-stats' => $node_stats,
+ 'service-stats' => {},
+ haenv => $haenv,
+ scheduler => $scheduler,
+ }, $class;
+}
+
+sub add_node {
+ my ($self, $nodename) = @_;
+
+ my $stats = $self->{'node-stats'}->{$nodename}
+ or die "did not get static node usage information for '$nodename'\n";
+ die "static node usage information for '$nodename' missing cpu count\n" if !$stats->{cpus};
+ die "static node usage information for '$nodename' missing memory\n" if !$stats->{memory};
+
+ eval { $self->{scheduler}->add_node($nodename, int($stats->{cpus}), int($stats->{memory})); };
+ die "initializing static node usage for '$nodename' failed - $@" if $@;
+}
+
+sub remove_node {
+ my ($self, $nodename) = @_;
+
+ $self->{scheduler}->remove_node($nodename);
+}
+
+sub list_nodes {
+ my ($self) = @_;
+
+ return $self->{scheduler}->list_nodes()->@*;
+}
+
+sub contains_node {
+ my ($self, $nodename) = @_;
+
+ return $self->{scheduler}->contains_node($nodename);
+}
+
+my sub get_service_usage {
+ my ($self, $sid, $service_node, $migration_target) = @_;
+
+ return $self->{'service-stats'}->{$sid} if $self->{'service-stats'}->{$sid};
+
+ my (undef, $type, $id) = $self->{haenv}->parse_sid($sid);
+ my $plugin = PVE::HA::Resources->lookup($type);
+
+ my $stats = eval { $plugin->get_static_stats($self->{haenv}, $id, $service_node); };
+ if (my $err = $@) {
+ # config might've already moved during a migration
+ $stats = eval { $plugin->get_static_stats($self->{haenv}, $id, $migration_target); } if $migration_target;
+ die "did not get static service usage information for '$sid' - $err\n" if !$stats;
+ }
+
+ my $service_stats = {
+ maxcpu => $stats->{maxcpu} + 0.0, # containers allow non-integer cpulimit
+ maxmem => int($stats->{maxmem}),
+ };
+
+ $self->{'service-stats'}->{$sid} = $service_stats;
+
+ return $service_stats;
+}
+
+sub add_service_usage_to_node {
+ my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+
+ eval {
+ my $service_usage = get_service_usage($self, $sid, $service_node, $migration_target);
+ $self->{scheduler}->add_service_usage_to_node($nodename, $service_usage);
+ };
+ $self->{haenv}->log('warning', "unable to add service '$sid' usage to node '$nodename' - $@")
+ if $@;
+}
+
+sub score_nodes_to_start_service {
+ my ($self, $sid, $service_node) = @_;
+
+ my $score_list = eval {
+ my $service_usage = get_service_usage($self, $sid, $service_node);
+ $self->{scheduler}->score_nodes_to_start_service($service_usage);
+ };
+ if (my $err = $@) {
+ $self->{haenv}->log(
+ 'err',
+ "unable to score nodes according to static usage for service '$sid' - $err",
+ );
+ # TODO maybe use service count as fallback?
+ return { map { $_ => 1 } $self->list_nodes() };
+ }
+
+ # Take minus the value, so that a lower score is better, which our caller(s) expect(s).
+ return { map { $_->[0] => -$_->[1] } $score_list->@* };
+}
+
+1;
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (5 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
` (10 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
The method will be extended to include other HA-relevant settings from
datacenter.cfg.
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
src/PVE/HA/Env.pm | 4 ++--
src/PVE/HA/Env/PVE2.pm | 2 +-
src/PVE/HA/LRM.pm | 2 +-
src/PVE/HA/Sim/Env.pm | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm
index 00e3e3c..16603ec 100644
--- a/src/PVE/HA/Env.pm
+++ b/src/PVE/HA/Env.pm
@@ -263,10 +263,10 @@ sub get_max_workers {
}
# return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
my ($self) = @_;
- return $self->{plug}->get_ha_settings();
+ return $self->{plug}->get_datacenter_settings();
}
sub get_static_node_stats {
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index 7fac43c..d2c46e8 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -452,7 +452,7 @@ sub get_max_workers {
}
# return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
my ($self) = @_;
my $datacenterconfig = eval { cfs_read_file('datacenter.cfg') };
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 8cbdb82..7750f4d 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -59,7 +59,7 @@ sub shutdown_request {
my ($shutdown, $reboot) = $haenv->is_node_shutdown();
- my $dc_ha_cfg = $haenv->get_ha_settings();
+ my $dc_ha_cfg = $haenv->get_datacenter_settings();
my $shutdown_policy = $dc_ha_cfg->{shutdown_policy} // 'conditional';
if ($shutdown) { # don't log this on service restart, only on node shutdown
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 6bd35b3..6c47030 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -425,7 +425,7 @@ sub get_max_workers {
}
# return cluster wide enforced HA settings
-sub get_ha_settings {
+sub get_datacenter_settings {
my ($self) = @_;
my $datacenterconfig = $self->{hardware}->read_datacenter_conf();
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (6 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
` (9 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Extend existing method rather than introducing a new one.
src/PVE/HA/Env/PVE2.pm | 10 +++++-----
src/PVE/HA/LRM.pm | 4 ++--
src/PVE/HA/Sim/Env.pm | 5 ++++-
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
index d2c46e8..f6ebfeb 100644
--- a/src/PVE/HA/Env/PVE2.pm
+++ b/src/PVE/HA/Env/PVE2.pm
@@ -456,12 +456,12 @@ sub get_datacenter_settings {
my ($self) = @_;
my $datacenterconfig = eval { cfs_read_file('datacenter.cfg') };
- if (my $err = $@) {
- $self->log('err', "unable to get HA settings from datacenter.cfg - $err");
- return {};
- }
+ $self->log('err', "unable to get HA settings from datacenter.cfg - $@") if $@;
- return $datacenterconfig->{ha};
+ return {
+ ha => $datacenterconfig->{ha} // {},
+ crs => $datacenterconfig->{crs} // {},
+ };
}
sub get_static_node_stats {
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 7750f4d..5d2fa2c 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -59,8 +59,8 @@ sub shutdown_request {
my ($shutdown, $reboot) = $haenv->is_node_shutdown();
- my $dc_ha_cfg = $haenv->get_datacenter_settings();
- my $shutdown_policy = $dc_ha_cfg->{shutdown_policy} // 'conditional';
+ my $dc_cfg = $haenv->get_datacenter_settings();
+ my $shutdown_policy = $dc_cfg->{ha}->{shutdown_policy} // 'conditional';
if ($shutdown) { # don't log this on service restart, only on node shutdown
$haenv->log('info', "got shutdown request with shutdown policy '$shutdown_policy'");
diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
index 6c47030..c6ea73c 100644
--- a/src/PVE/HA/Sim/Env.pm
+++ b/src/PVE/HA/Sim/Env.pm
@@ -430,7 +430,10 @@ sub get_datacenter_settings {
my $datacenterconfig = $self->{hardware}->read_datacenter_conf();
- return $datacenterconfig->{ha};
+ return {
+ ha => $datacenterconfig->{ha} // {},
+ crs => $datacenterconfig->{crs} // {},
+ };
}
sub get_static_node_stats {
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (7 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
` (8 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Switch to get_datacenter_settings() replacing the previous
get_crs_settings() in v1.
src/PVE/HA/Manager.pm | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 63e6c8a..1638442 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -52,6 +52,11 @@ sub new {
$self->{ms} = { master_node => $haenv->nodename() };
+ my $dc_cfg = $haenv->get_datacenter_settings();
+ $self->{'scheduler-mode'} = $dc_cfg->{crs}->{ha} ? $dc_cfg->{crs}->{ha} : 'basic';
+ $haenv->log('info', "using scheduler mode '$self->{'scheduler-mode'}'")
+ if $self->{'scheduler-mode'} ne 'basic';
+
return $self;
}
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (8 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
` (7 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
Note that recompute_online_node_usage() becomes much slower when the
'static' resource scheduler mode is used. Tested it with ~300 HA
services (minimal containers) running on my virtual test cluster.
Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds
Combined with the fact that recompute_online_node_usage() is currently
called very often this can lead to a lot of delay during recovery
situations with hundreds of services and low thousands of services
overall and with genereous estimates even run into the watchdog timer.
Ideas to remedy this is using PVE::Cluster's
get_guest_config_properties() instead of load_config() and/or
optimizing how often recompute_online_node_usage() is called.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Add fixme note about overhead.
* Add benchmark results to commit message.
src/PVE/HA/Manager.pm | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 1638442..7f1d1d7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,7 @@ use PVE::Tools;
use PVE::HA::Tools ':exit_codes';
use PVE::HA::NodeStatus;
use PVE::HA::Usage::Basic;
+use PVE::HA::Usage::Static;
## Variable Name & Abbreviations Convention
#
@@ -203,14 +204,35 @@ my $valid_service_states = {
error => 1,
};
+# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact
+# that this function is called for each state change and upon recovery doesn't help.
sub recompute_online_node_usage {
my ($self) = @_;
- my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
+ my $haenv = $self->{haenv};
my $online_nodes = $self->{ns}->list_online_nodes();
- $online_node_usage->add_node($_) for $online_nodes->@*;
+ my $online_node_usage;
+
+ if (my $mode = $self->{'scheduler-mode'}) {
+ if ($mode eq 'static') {
+ $online_node_usage = eval {
+ my $scheduler = PVE::HA::Usage::Static->new($haenv);
+ $scheduler->add_node($_) for $online_nodes->@*;
+ return $scheduler;
+ };
+ $haenv->log('warning', "using 'basic' scheduler mode, init for 'static' failed - $@")
+ if $@;
+ } elsif ($mode ne 'basic') {
+ $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'");
+ }
+ }
+
+ if (!$online_node_usage) {
+ $online_node_usage = PVE::HA::Usage::Basic->new($haenv);
+ $online_node_usage->add_node($_) for $online_nodes->@*;
+ }
foreach my $sid (keys %{$self->{ss}}) {
my $sd = $self->{ss}->{$sid};
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (9 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
` (6 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes from v1.
src/PVE/HA/Manager.pm | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 7f1d1d7..cc2ada4 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -149,25 +149,20 @@ sub select_service_node {
}
}
+ return $maintenance_fallback
+ if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
} keys %{$pri_groups->{$top_pri}};
my $found;
- my $found_maintenance_fallback;
for (my $i = scalar(@nodes) - 1; $i >= 0; $i--) {
my $node = $nodes[$i];
if ($node eq $current_node) {
$found = $i;
}
- if (defined($maintenance_fallback) && $node eq $maintenance_fallback) {
- $found_maintenance_fallback = $i;
- }
- }
-
- if (defined($found_maintenance_fallback)) {
- return $nodes[$found_maintenance_fallback];
}
if ($try_next) {
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current node is valid
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (10 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
` (5 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.
This should cover most calls of select_service_node().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
No changes from v1.
src/PVE/HA/Manager.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index cc2ada4..69bfbc3 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -152,6 +152,8 @@ sub select_service_node {
return $maintenance_fallback
if defined($maintenance_fallback) && $pri_groups->{$top_pri}->{$maintenance_fallback};
+ return $current_node if !$try_next && $pri_groups->{$top_pri}->{$current_node};
+
my $scores = $online_node_usage->score_nodes_to_start_service($sid, $current_node);
my @nodes = sort {
$scores->{$a} <=> $scores->{$b} || $a cmp $b
@@ -171,8 +173,6 @@ sub select_service_node {
} else {
return $nodes[0];
}
- } elsif (defined($found)) {
- return $nodes[$found];
} else {
return $nodes[0];
}
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (11 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
` (4 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
if something goes wrong with the TOPSIS scoring. Not expected to
happen, but it's rather cheap to be on the safe side.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
src/PVE/HA/Usage/Static.pm | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/PVE/HA/Usage/Static.pm b/src/PVE/HA/Usage/Static.pm
index ce705eb..73ce836 100644
--- a/src/PVE/HA/Usage/Static.pm
+++ b/src/PVE/HA/Usage/Static.pm
@@ -22,12 +22,15 @@ sub new {
'service-stats' => {},
haenv => $haenv,
scheduler => $scheduler,
+ 'service-counts' => {}, # Service count on each node. Fallback if scoring calculation fails.
}, $class;
}
sub add_node {
my ($self, $nodename) = @_;
+ $self->{'service-counts'}->{$nodename} = 0;
+
my $stats = $self->{'node-stats'}->{$nodename}
or die "did not get static node usage information for '$nodename'\n";
die "static node usage information for '$nodename' missing cpu count\n" if !$stats->{cpus};
@@ -40,6 +43,8 @@ sub add_node {
sub remove_node {
my ($self, $nodename) = @_;
+ delete $self->{'service-counts'}->{$nodename};
+
$self->{scheduler}->remove_node($nodename);
}
@@ -83,6 +88,8 @@ my sub get_service_usage {
sub add_service_usage_to_node {
my ($self, $nodename, $sid, $service_node, $migration_target) = @_;
+ $self->{'service-counts'}->{$nodename}++;
+
eval {
my $service_usage = get_service_usage($self, $sid, $service_node, $migration_target);
$self->{scheduler}->add_service_usage_to_node($nodename, $service_usage);
@@ -103,8 +110,7 @@ sub score_nodes_to_start_service {
'err',
"unable to score nodes according to static usage for service '$sid' - $err",
);
- # TODO maybe use service count as fallback?
- return { map { $_ => 1 } $self->list_nodes() };
+ return $self->{'service-counts'};
}
# Take minus the value, so that a lower score is better, which our caller(s) expect(s).
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (12 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
` (3 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
See the READMEs for more information about the tests.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
src/test/test-crs-static1/README | 4 +
src/test/test-crs-static1/cmdlist | 4 +
src/test/test-crs-static1/datacenter.cfg | 6 +
src/test/test-crs-static1/hardware_status | 5 +
src/test/test-crs-static1/log.expect | 50 +++++
src/test/test-crs-static1/manager_status | 1 +
src/test/test-crs-static1/service_config | 3 +
.../test-crs-static1/static_service_stats | 3 +
src/test/test-crs-static2/README | 4 +
src/test/test-crs-static2/cmdlist | 20 ++
src/test/test-crs-static2/datacenter.cfg | 6 +
src/test/test-crs-static2/groups | 2 +
src/test/test-crs-static2/hardware_status | 7 +
src/test/test-crs-static2/log.expect | 171 ++++++++++++++++++
src/test/test-crs-static2/manager_status | 1 +
src/test/test-crs-static2/service_config | 3 +
.../test-crs-static2/static_service_stats | 3 +
src/test/test-crs-static3/README | 5 +
src/test/test-crs-static3/cmdlist | 4 +
src/test/test-crs-static3/datacenter.cfg | 9 +
src/test/test-crs-static3/hardware_status | 5 +
src/test/test-crs-static3/log.expect | 131 ++++++++++++++
src/test/test-crs-static3/manager_status | 1 +
src/test/test-crs-static3/service_config | 12 ++
.../test-crs-static3/static_service_stats | 12 ++
src/test/test-crs-static4/README | 6 +
src/test/test-crs-static4/cmdlist | 4 +
src/test/test-crs-static4/datacenter.cfg | 9 +
src/test/test-crs-static4/hardware_status | 5 +
src/test/test-crs-static4/log.expect | 149 +++++++++++++++
src/test/test-crs-static4/manager_status | 1 +
src/test/test-crs-static4/service_config | 12 ++
.../test-crs-static4/static_service_stats | 12 ++
src/test/test-crs-static5/README | 5 +
src/test/test-crs-static5/cmdlist | 4 +
src/test/test-crs-static5/datacenter.cfg | 9 +
src/test/test-crs-static5/hardware_status | 5 +
src/test/test-crs-static5/log.expect | 117 ++++++++++++
src/test/test-crs-static5/manager_status | 1 +
src/test/test-crs-static5/service_config | 10 +
.../test-crs-static5/static_service_stats | 11 ++
41 files changed, 832 insertions(+)
create mode 100644 src/test/test-crs-static1/README
create mode 100644 src/test/test-crs-static1/cmdlist
create mode 100644 src/test/test-crs-static1/datacenter.cfg
create mode 100644 src/test/test-crs-static1/hardware_status
create mode 100644 src/test/test-crs-static1/log.expect
create mode 100644 src/test/test-crs-static1/manager_status
create mode 100644 src/test/test-crs-static1/service_config
create mode 100644 src/test/test-crs-static1/static_service_stats
create mode 100644 src/test/test-crs-static2/README
create mode 100644 src/test/test-crs-static2/cmdlist
create mode 100644 src/test/test-crs-static2/datacenter.cfg
create mode 100644 src/test/test-crs-static2/groups
create mode 100644 src/test/test-crs-static2/hardware_status
create mode 100644 src/test/test-crs-static2/log.expect
create mode 100644 src/test/test-crs-static2/manager_status
create mode 100644 src/test/test-crs-static2/service_config
create mode 100644 src/test/test-crs-static2/static_service_stats
create mode 100644 src/test/test-crs-static3/README
create mode 100644 src/test/test-crs-static3/cmdlist
create mode 100644 src/test/test-crs-static3/datacenter.cfg
create mode 100644 src/test/test-crs-static3/hardware_status
create mode 100644 src/test/test-crs-static3/log.expect
create mode 100644 src/test/test-crs-static3/manager_status
create mode 100644 src/test/test-crs-static3/service_config
create mode 100644 src/test/test-crs-static3/static_service_stats
create mode 100644 src/test/test-crs-static4/README
create mode 100644 src/test/test-crs-static4/cmdlist
create mode 100644 src/test/test-crs-static4/datacenter.cfg
create mode 100644 src/test/test-crs-static4/hardware_status
create mode 100644 src/test/test-crs-static4/log.expect
create mode 100644 src/test/test-crs-static4/manager_status
create mode 100644 src/test/test-crs-static4/service_config
create mode 100644 src/test/test-crs-static4/static_service_stats
create mode 100644 src/test/test-crs-static5/README
create mode 100644 src/test/test-crs-static5/cmdlist
create mode 100644 src/test/test-crs-static5/datacenter.cfg
create mode 100644 src/test/test-crs-static5/hardware_status
create mode 100644 src/test/test-crs-static5/log.expect
create mode 100644 src/test/test-crs-static5/manager_status
create mode 100644 src/test/test-crs-static5/service_config
create mode 100644 src/test/test-crs-static5/static_service_stats
diff --git a/src/test/test-crs-static1/README b/src/test/test-crs-static1/README
new file mode 100644
index 0000000..483f265
--- /dev/null
+++ b/src/test/test-crs-static1/README
@@ -0,0 +1,4 @@
+Test how service recovery works with the 'static' resource scheduling mode.
+
+Expect that the single service gets recovered to the node with the most
+available resources.
diff --git a/src/test/test-crs-static1/cmdlist b/src/test/test-crs-static1/cmdlist
new file mode 100644
index 0000000..8684073
--- /dev/null
+++ b/src/test/test-crs-static1/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "network node1 off" ]
+]
diff --git a/src/test/test-crs-static1/datacenter.cfg b/src/test/test-crs-static1/datacenter.cfg
new file mode 100644
index 0000000..8f83457
--- /dev/null
+++ b/src/test/test-crs-static1/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static"
+ }
+}
+
diff --git a/src/test/test-crs-static1/hardware_status b/src/test/test-crs-static1/hardware_status
new file mode 100644
index 0000000..0fa8c26
--- /dev/null
+++ b/src/test/test-crs-static1/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 200000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 300000000000 }
+}
diff --git a/src/test/test-crs-static1/log.expect b/src/test/test-crs-static1/log.expect
new file mode 100644
index 0000000..2b06b3c
--- /dev/null
+++ b/src/test/test-crs-static1/log.expect
@@ -0,0 +1,50 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute network node1 off
+info 120 node1/crm: status change master => lost_manager_lock
+info 120 node1/crm: status change lost_manager_lock => wait_for_quorum
+info 121 node1/lrm: status change active => lost_agent_lock
+info 162 watchdog: execute power node1 off
+info 161 node1/crm: killed by poweroff
+info 162 node1/lrm: killed by poweroff
+info 162 hardware: server 'node1' stopped by poweroff (watchdog)
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'static'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info 282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3)
+info 283 node3/lrm: got lock 'ha_agent_node3_lock'
+info 283 node3/lrm: status change wait_for_agent_lock => active
+info 283 node3/lrm: starting service vm:102
+info 283 node3/lrm: service status vm:102 started
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static1/manager_status b/src/test/test-crs-static1/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static1/service_config b/src/test/test-crs-static1/service_config
new file mode 100644
index 0000000..9c12447
--- /dev/null
+++ b/src/test/test-crs-static1/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static1/static_service_stats b/src/test/test-crs-static1/static_service_stats
new file mode 100644
index 0000000..7fb992d
--- /dev/null
+++ b/src/test/test-crs-static1/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static2/README b/src/test/test-crs-static2/README
new file mode 100644
index 0000000..61530a7
--- /dev/null
+++ b/src/test/test-crs-static2/README
@@ -0,0 +1,4 @@
+Test how service recovery works with the 'static' resource scheduling mode.
+
+Expect that the single service always gets recovered to the node with the most
+available resources. Also tests that the group priority still takes precedence.
diff --git a/src/test/test-crs-static2/cmdlist b/src/test/test-crs-static2/cmdlist
new file mode 100644
index 0000000..bada1bb
--- /dev/null
+++ b/src/test/test-crs-static2/cmdlist
@@ -0,0 +1,20 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on", "power node5 on" ],
+ [ "power node1 off" ],
+ [ "delay 300" ],
+ [ "power node1 on" ],
+ [ "delay 300" ],
+ [ "power node4 on" ],
+ [ "power node1 off" ],
+ [ "delay 300" ],
+ [ "power node1 on" ],
+ [ "delay 300" ],
+ [ "power node2 off" ],
+ [ "power node1 off" ],
+ [ "delay 300" ],
+ [ "power node1 on" ],
+ [ "delay 300" ],
+ [ "power node2 on" ],
+ [ "power node3 off" ],
+ [ "power node1 off" ]
+]
diff --git a/src/test/test-crs-static2/datacenter.cfg b/src/test/test-crs-static2/datacenter.cfg
new file mode 100644
index 0000000..8f83457
--- /dev/null
+++ b/src/test/test-crs-static2/datacenter.cfg
@@ -0,0 +1,6 @@
+{
+ "crs": {
+ "ha": "static"
+ }
+}
+
diff --git a/src/test/test-crs-static2/groups b/src/test/test-crs-static2/groups
new file mode 100644
index 0000000..43e9bf5
--- /dev/null
+++ b/src/test/test-crs-static2/groups
@@ -0,0 +1,2 @@
+group: prefer_node1
+ nodes node1
diff --git a/src/test/test-crs-static2/hardware_status b/src/test/test-crs-static2/hardware_status
new file mode 100644
index 0000000..d426023
--- /dev/null
+++ b/src/test/test-crs-static2/hardware_status
@@ -0,0 +1,7 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 200000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 300000000000 },
+ "node4": { "power": "off", "network": "off", "cpus": 64, "memory": 300000000000 },
+ "node5": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static2/log.expect b/src/test/test-crs-static2/log.expect
new file mode 100644
index 0000000..ee4416c
--- /dev/null
+++ b/src/test/test-crs-static2/log.expect
@@ -0,0 +1,171 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node5 on
+info 20 node5/crm: status change startup => wait_for_quorum
+info 20 node5/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 26 node5/crm: status change wait_for_quorum => slave
+info 120 cmdlist: execute power node1 off
+info 120 node1/crm: killed by poweroff
+info 120 node1/lrm: killed by poweroff
+info 220 cmdlist: execute delay 300
+info 222 node3/crm: got lock 'ha_manager_lock'
+info 222 node3/crm: status change slave => master
+info 222 node3/crm: using scheduler mode 'static'
+info 222 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 282 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 282 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 282 node3/crm: FENCE: Try to fence node 'node1'
+info 282 node3/crm: got lock 'ha_agent_node1_lock'
+info 282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node3'
+info 282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node3)
+info 283 node3/lrm: got lock 'ha_agent_node3_lock'
+info 283 node3/lrm: status change wait_for_agent_lock => active
+info 283 node3/lrm: starting service vm:102
+info 283 node3/lrm: service status vm:102 started
+info 600 cmdlist: execute power node1 on
+info 600 node1/crm: status change startup => wait_for_quorum
+info 600 node1/lrm: status change startup => wait_for_agent_lock
+info 600 node1/crm: status change wait_for_quorum => slave
+info 604 node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info 604 node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info 604 node3/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node3, target = node1)
+info 605 node3/lrm: service vm:102 - start migrate to node 'node1'
+info 605 node3/lrm: service vm:102 - end migrate to node 'node1'
+info 624 node3/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 641 node1/lrm: got lock 'ha_agent_node1_lock'
+info 641 node1/lrm: status change wait_for_agent_lock => active
+info 641 node1/lrm: starting service vm:102
+info 641 node1/lrm: service status vm:102 started
+info 700 cmdlist: execute delay 300
+info 1080 cmdlist: execute power node4 on
+info 1080 node4/crm: status change startup => wait_for_quorum
+info 1080 node4/lrm: status change startup => wait_for_agent_lock
+info 1084 node3/crm: node 'node4': state changed from 'unknown' => 'online'
+info 1086 node4/crm: status change wait_for_quorum => slave
+info 1180 cmdlist: execute power node1 off
+info 1180 node1/crm: killed by poweroff
+info 1180 node1/lrm: killed by poweroff
+info 1182 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 1242 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 1242 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 1242 node3/crm: FENCE: Try to fence node 'node1'
+info 1280 cmdlist: execute delay 300
+info 1282 node3/crm: got lock 'ha_agent_node1_lock'
+info 1282 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 1282 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 1282 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 1282 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 1282 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info 1282 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node4)
+info 1285 node4/lrm: got lock 'ha_agent_node4_lock'
+info 1285 node4/lrm: status change wait_for_agent_lock => active
+info 1285 node4/lrm: starting service vm:102
+info 1285 node4/lrm: service status vm:102 started
+info 1660 cmdlist: execute power node1 on
+info 1660 node1/crm: status change startup => wait_for_quorum
+info 1660 node1/lrm: status change startup => wait_for_agent_lock
+info 1660 node1/crm: status change wait_for_quorum => slave
+info 1664 node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info 1664 node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info 1664 node3/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node4, target = node1)
+info 1667 node4/lrm: service vm:102 - start migrate to node 'node1'
+info 1667 node4/lrm: service vm:102 - end migrate to node 'node1'
+info 1684 node3/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 1701 node1/lrm: got lock 'ha_agent_node1_lock'
+info 1701 node1/lrm: status change wait_for_agent_lock => active
+info 1701 node1/lrm: starting service vm:102
+info 1701 node1/lrm: service status vm:102 started
+info 1760 cmdlist: execute delay 300
+info 1825 node3/lrm: node had no service configured for 60 rounds, going idle.
+info 1825 node3/lrm: status change active => wait_for_agent_lock
+info 2140 cmdlist: execute power node2 off
+info 2140 node2/crm: killed by poweroff
+info 2140 node2/lrm: killed by poweroff
+info 2142 node3/crm: node 'node2': state changed from 'online' => 'unknown'
+info 2240 cmdlist: execute power node1 off
+info 2240 node1/crm: killed by poweroff
+info 2240 node1/lrm: killed by poweroff
+info 2240 node3/crm: node 'node1': state changed from 'online' => 'unknown'
+info 2300 node3/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 2300 node3/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 2300 node3/crm: FENCE: Try to fence node 'node1'
+info 2340 cmdlist: execute delay 300
+info 2360 node3/crm: got lock 'ha_agent_node1_lock'
+info 2360 node3/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 2360 node3/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 2360 node3/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 2360 node3/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 2360 node3/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info 2360 node3/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node4)
+info 2363 node4/lrm: starting service vm:102
+info 2363 node4/lrm: service status vm:102 started
+info 2720 cmdlist: execute power node1 on
+info 2720 node1/crm: status change startup => wait_for_quorum
+info 2720 node1/lrm: status change startup => wait_for_agent_lock
+info 2720 node1/crm: status change wait_for_quorum => slave
+info 2722 node3/crm: node 'node1': state changed from 'unknown' => 'online'
+info 2722 node3/crm: migrate service 'vm:102' to node 'node1' (running)
+info 2722 node3/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node4, target = node1)
+info 2725 node4/lrm: service vm:102 - start migrate to node 'node1'
+info 2725 node4/lrm: service vm:102 - end migrate to node 'node1'
+info 2742 node3/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 2761 node1/lrm: got lock 'ha_agent_node1_lock'
+info 2761 node1/lrm: status change wait_for_agent_lock => active
+info 2761 node1/lrm: starting service vm:102
+info 2761 node1/lrm: service status vm:102 started
+info 2820 cmdlist: execute delay 300
+info 3200 cmdlist: execute power node2 on
+info 3200 node2/crm: status change startup => wait_for_quorum
+info 3200 node2/lrm: status change startup => wait_for_agent_lock
+info 3202 node2/crm: status change wait_for_quorum => slave
+info 3204 node3/crm: node 'node2': state changed from 'unknown' => 'online'
+info 3300 cmdlist: execute power node3 off
+info 3300 node3/crm: killed by poweroff
+info 3300 node3/lrm: killed by poweroff
+info 3400 cmdlist: execute power node1 off
+info 3400 node1/crm: killed by poweroff
+info 3400 node1/lrm: killed by poweroff
+info 3420 node2/crm: got lock 'ha_manager_lock'
+info 3420 node2/crm: status change slave => master
+info 3420 node2/crm: using scheduler mode 'static'
+info 3420 node2/crm: node 'node1': state changed from 'online' => 'unknown'
+info 3420 node2/crm: node 'node3': state changed from 'online' => 'unknown'
+info 3480 node2/crm: service 'vm:102': state changed from 'started' to 'fence'
+info 3480 node2/crm: node 'node1': state changed from 'unknown' => 'fence'
+emai 3480 node2/crm: FENCE: Try to fence node 'node1'
+info 3520 node2/crm: got lock 'ha_agent_node1_lock'
+info 3520 node2/crm: fencing: acknowledged - got agent lock for node 'node1'
+info 3520 node2/crm: node 'node1': state changed from 'fence' => 'unknown'
+emai 3520 node2/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node1'
+info 3520 node2/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info 3520 node2/crm: recover service 'vm:102' from fenced node 'node1' to node 'node4'
+info 3520 node2/crm: service 'vm:102': state changed from 'recovery' to 'started' (node = node4)
+info 3523 node4/lrm: starting service vm:102
+info 3523 node4/lrm: service status vm:102 started
+info 4000 hardware: exit simulation - done
diff --git a/src/test/test-crs-static2/manager_status b/src/test/test-crs-static2/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static2/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static2/service_config b/src/test/test-crs-static2/service_config
new file mode 100644
index 0000000..1f2333d
--- /dev/null
+++ b/src/test/test-crs-static2/service_config
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "node": "node1", "state": "enabled", "group": "prefer_node1" }
+}
diff --git a/src/test/test-crs-static2/static_service_stats b/src/test/test-crs-static2/static_service_stats
new file mode 100644
index 0000000..7fb992d
--- /dev/null
+++ b/src/test/test-crs-static2/static_service_stats
@@ -0,0 +1,3 @@
+{
+ "vm:102": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static3/README b/src/test/test-crs-static3/README
new file mode 100644
index 0000000..db929e1
--- /dev/null
+++ b/src/test/test-crs-static3/README
@@ -0,0 +1,5 @@
+Test how shutdown migrate policy works with the 'static' resource scheduling
+mode.
+
+Expect that when node1 is shut down the services get migrated in the repeating
+sequence node2 node2 node3, because node 2 has twice the resources of node 3.
diff --git a/src/test/test-crs-static3/cmdlist b/src/test/test-crs-static3/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static3/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static3/datacenter.cfg b/src/test/test-crs-static3/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static3/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+ "crs": {
+ "ha": "static"
+ },
+ "ha": {
+ "shutdown_policy": "migrate"
+ }
+}
+
diff --git a/src/test/test-crs-static3/hardware_status b/src/test/test-crs-static3/hardware_status
new file mode 100644
index 0000000..dfbf496
--- /dev/null
+++ b/src/test/test-crs-static3/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 64, "memory": 200000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static3/log.expect b/src/test/test-crs-static3/log.expect
new file mode 100644
index 0000000..00cfefb
--- /dev/null
+++ b/src/test/test-crs-static3/log.expect
@@ -0,0 +1,131 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'vm:100' on node 'node1'
+info 20 node1/crm: adding new service 'vm:101' on node 'node1'
+info 20 node1/crm: adding new service 'vm:102' on node 'node1'
+info 20 node1/crm: adding new service 'vm:103' on node 'node1'
+info 20 node1/crm: adding new service 'vm:104' on node 'node1'
+info 20 node1/crm: adding new service 'vm:105' on node 'node1'
+info 20 node1/crm: adding new service 'vm:106' on node 'node1'
+info 20 node1/crm: adding new service 'vm:107' on node 'node1'
+info 20 node1/crm: adding new service 'vm:108' on node 'node1'
+info 20 node1/crm: adding new service 'vm:109' on node 'node1'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service vm:101
+info 21 node1/lrm: service status vm:101 started
+info 21 node1/lrm: starting service vm:102
+info 21 node1/lrm: service status vm:102 started
+info 21 node1/lrm: starting service vm:103
+info 21 node1/lrm: service status vm:103 started
+info 21 node1/lrm: starting service vm:104
+info 21 node1/lrm: service status vm:104 started
+info 21 node1/lrm: starting service vm:105
+info 21 node1/lrm: service status vm:105 started
+info 21 node1/lrm: starting service vm:106
+info 21 node1/lrm: service status vm:106 started
+info 21 node1/lrm: starting service vm:107
+info 21 node1/lrm: service status vm:107 started
+info 21 node1/lrm: starting service vm:108
+info 21 node1/lrm: service status vm:108 started
+info 21 node1/lrm: starting service vm:109
+info 21 node1/lrm: service status vm:109 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'vm:100': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute shutdown node1
+info 120 node1/lrm: got shutdown request with shutdown policy 'migrate'
+info 120 node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info 120 node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info 120 node1/crm: migrate service 'vm:101' to node 'node2' (running)
+info 120 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:102' to node 'node2' (running)
+info 120 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:103' to node 'node3' (running)
+info 120 node1/crm: service 'vm:103': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 120 node1/crm: migrate service 'vm:104' to node 'node2' (running)
+info 120 node1/crm: service 'vm:104': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:105' to node 'node2' (running)
+info 120 node1/crm: service 'vm:105': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:106' to node 'node3' (running)
+info 120 node1/crm: service 'vm:106': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 120 node1/crm: migrate service 'vm:107' to node 'node2' (running)
+info 120 node1/crm: service 'vm:107': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:108' to node 'node2' (running)
+info 120 node1/crm: service 'vm:108': state changed from 'started' to 'migrate' (node = node1, target = node2)
+info 120 node1/crm: migrate service 'vm:109' to node 'node3' (running)
+info 120 node1/crm: service 'vm:109': state changed from 'started' to 'migrate' (node = node1, target = node3)
+info 121 node1/lrm: status change active => maintenance
+info 121 node1/lrm: service vm:101 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:101 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:102 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:103 - start migrate to node 'node3'
+info 121 node1/lrm: service vm:103 - end migrate to node 'node3'
+info 121 node1/lrm: service vm:104 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:104 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:105 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:105 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:106 - start migrate to node 'node3'
+info 121 node1/lrm: service vm:106 - end migrate to node 'node3'
+info 121 node1/lrm: service vm:107 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:107 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:108 - start migrate to node 'node2'
+info 121 node1/lrm: service vm:108 - end migrate to node 'node2'
+info 121 node1/lrm: service vm:109 - start migrate to node 'node3'
+info 121 node1/lrm: service vm:109 - end migrate to node 'node3'
+info 140 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
+info 140 node1/crm: service 'vm:104': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:105': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:106': state changed from 'migrate' to 'started' (node = node3)
+info 140 node1/crm: service 'vm:107': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:108': state changed from 'migrate' to 'started' (node = node2)
+info 140 node1/crm: service 'vm:109': state changed from 'migrate' to 'started' (node = node3)
+info 142 node1/lrm: exit (loop end)
+info 142 shutdown: execute crm node1 stop
+info 141 node1/crm: server received shutdown request
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service vm:101
+info 143 node2/lrm: service status vm:101 started
+info 143 node2/lrm: starting service vm:102
+info 143 node2/lrm: service status vm:102 started
+info 143 node2/lrm: starting service vm:104
+info 143 node2/lrm: service status vm:104 started
+info 143 node2/lrm: starting service vm:105
+info 143 node2/lrm: service status vm:105 started
+info 143 node2/lrm: starting service vm:107
+info 143 node2/lrm: service status vm:107 started
+info 143 node2/lrm: starting service vm:108
+info 143 node2/lrm: service status vm:108 started
+info 145 node3/lrm: got lock 'ha_agent_node3_lock'
+info 145 node3/lrm: status change wait_for_agent_lock => active
+info 145 node3/lrm: starting service vm:103
+info 145 node3/lrm: service status vm:103 started
+info 145 node3/lrm: starting service vm:106
+info 145 node3/lrm: service status vm:106 started
+info 145 node3/lrm: starting service vm:109
+info 145 node3/lrm: service status vm:109 started
+info 160 node1/crm: voluntary release CRM lock
+info 161 node1/crm: exit (loop end)
+info 161 shutdown: execute power node1 off
+info 161 node2/crm: got lock 'ha_manager_lock'
+info 161 node2/crm: status change slave => master
+info 161 node2/crm: using scheduler mode 'static'
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static3/manager_status b/src/test/test-crs-static3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static3/service_config b/src/test/test-crs-static3/service_config
new file mode 100644
index 0000000..47f94d3
--- /dev/null
+++ b/src/test/test-crs-static3/service_config
@@ -0,0 +1,12 @@
+{
+ "vm:100": { "node": "node1", "state": "stopped" },
+ "vm:101": { "node": "node1", "state": "enabled" },
+ "vm:102": { "node": "node1", "state": "enabled" },
+ "vm:103": { "node": "node1", "state": "enabled" },
+ "vm:104": { "node": "node1", "state": "enabled" },
+ "vm:105": { "node": "node1", "state": "enabled" },
+ "vm:106": { "node": "node1", "state": "enabled" },
+ "vm:107": { "node": "node1", "state": "enabled" },
+ "vm:108": { "node": "node1", "state": "enabled" },
+ "vm:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static3/static_service_stats b/src/test/test-crs-static3/static_service_stats
new file mode 100644
index 0000000..bca71cb
--- /dev/null
+++ b/src/test/test-crs-static3/static_service_stats
@@ -0,0 +1,12 @@
+{
+ "vm:100": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:101": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:102": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:103": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:104": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:105": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:106": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:107": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:108": { "maxcpu": 2, "maxmem": 4000000000 },
+ "vm:109": { "maxcpu": 2, "maxmem": 4000000000 }
+}
diff --git a/src/test/test-crs-static4/README b/src/test/test-crs-static4/README
new file mode 100644
index 0000000..4dfb1bc
--- /dev/null
+++ b/src/test/test-crs-static4/README
@@ -0,0 +1,6 @@
+Test how shutdown migrate policy works with the 'static' resource scheduling
+mode.
+
+Expect that, when node1 is shut down, the first service is migrated to node2 and
+all others to node 3, because the first service is very resource-heavy compared
+to the others.
diff --git a/src/test/test-crs-static4/cmdlist b/src/test/test-crs-static4/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static4/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static4/datacenter.cfg b/src/test/test-crs-static4/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static4/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+ "crs": {
+ "ha": "static"
+ },
+ "ha": {
+ "shutdown_policy": "migrate"
+ }
+}
+
diff --git a/src/test/test-crs-static4/hardware_status b/src/test/test-crs-static4/hardware_status
new file mode 100644
index 0000000..a83a2dc
--- /dev/null
+++ b/src/test/test-crs-static4/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static4/log.expect b/src/test/test-crs-static4/log.expect
new file mode 100644
index 0000000..3eedc23
--- /dev/null
+++ b/src/test/test-crs-static4/log.expect
@@ -0,0 +1,149 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'ct:100' on node 'node1'
+info 20 node1/crm: adding new service 'ct:101' on node 'node1'
+info 20 node1/crm: adding new service 'ct:102' on node 'node1'
+info 20 node1/crm: adding new service 'ct:103' on node 'node1'
+info 20 node1/crm: adding new service 'ct:104' on node 'node1'
+info 20 node1/crm: adding new service 'ct:105' on node 'node1'
+info 20 node1/crm: adding new service 'ct:106' on node 'node1'
+info 20 node1/crm: adding new service 'ct:107' on node 'node1'
+info 20 node1/crm: adding new service 'ct:108' on node 'node1'
+info 20 node1/crm: adding new service 'ct:109' on node 'node1'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service ct:101
+info 21 node1/lrm: service status ct:101 started
+info 21 node1/lrm: starting service ct:102
+info 21 node1/lrm: service status ct:102 started
+info 21 node1/lrm: starting service ct:103
+info 21 node1/lrm: service status ct:103 started
+info 21 node1/lrm: starting service ct:104
+info 21 node1/lrm: service status ct:104 started
+info 21 node1/lrm: starting service ct:105
+info 21 node1/lrm: service status ct:105 started
+info 21 node1/lrm: starting service ct:106
+info 21 node1/lrm: service status ct:106 started
+info 21 node1/lrm: starting service ct:107
+info 21 node1/lrm: service status ct:107 started
+info 21 node1/lrm: starting service ct:108
+info 21 node1/lrm: service status ct:108 started
+info 21 node1/lrm: starting service ct:109
+info 21 node1/lrm: service status ct:109 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 24 node3/crm: status change wait_for_quorum => slave
+info 40 node1/crm: service 'ct:100': state changed from 'request_stop' to 'stopped'
+info 120 cmdlist: execute shutdown node1
+info 120 node1/lrm: got shutdown request with shutdown policy 'migrate'
+info 120 node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info 120 node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info 120 node1/crm: relocate service 'ct:101' to node 'node2'
+info 120 node1/crm: service 'ct:101': state changed from 'started' to 'relocate' (node = node1, target = node2)
+info 120 node1/crm: relocate service 'ct:102' to node 'node3'
+info 120 node1/crm: service 'ct:102': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:103' to node 'node3'
+info 120 node1/crm: service 'ct:103': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:104' to node 'node3'
+info 120 node1/crm: service 'ct:104': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:105' to node 'node3'
+info 120 node1/crm: service 'ct:105': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:106' to node 'node3'
+info 120 node1/crm: service 'ct:106': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:107' to node 'node3'
+info 120 node1/crm: service 'ct:107': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:108' to node 'node3'
+info 120 node1/crm: service 'ct:108': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:109' to node 'node3'
+info 120 node1/crm: service 'ct:109': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 121 node1/lrm: status change active => maintenance
+info 121 node1/lrm: service ct:101 - start relocate to node 'node2'
+info 121 node1/lrm: stopping service ct:101 (relocate)
+info 121 node1/lrm: service status ct:101 stopped
+info 121 node1/lrm: service ct:101 - end relocate to node 'node2'
+info 121 node1/lrm: service ct:102 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:102 (relocate)
+info 121 node1/lrm: service status ct:102 stopped
+info 121 node1/lrm: service ct:102 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:103 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:103 (relocate)
+info 121 node1/lrm: service status ct:103 stopped
+info 121 node1/lrm: service ct:103 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:104 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:104 (relocate)
+info 121 node1/lrm: service status ct:104 stopped
+info 121 node1/lrm: service ct:104 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:105 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:105 (relocate)
+info 121 node1/lrm: service status ct:105 stopped
+info 121 node1/lrm: service ct:105 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:106 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:106 (relocate)
+info 121 node1/lrm: service status ct:106 stopped
+info 121 node1/lrm: service ct:106 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:107 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:107 (relocate)
+info 121 node1/lrm: service status ct:107 stopped
+info 121 node1/lrm: service ct:107 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:108 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:108 (relocate)
+info 121 node1/lrm: service status ct:108 stopped
+info 121 node1/lrm: service ct:108 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:109 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:109 (relocate)
+info 121 node1/lrm: service status ct:109 stopped
+info 121 node1/lrm: service ct:109 - end relocate to node 'node3'
+info 140 node1/crm: service 'ct:101': state changed from 'relocate' to 'started' (node = node2)
+info 140 node1/crm: service 'ct:102': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:103': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:104': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:105': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:106': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:107': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:108': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:109': state changed from 'relocate' to 'started' (node = node3)
+info 142 node1/lrm: exit (loop end)
+info 142 shutdown: execute crm node1 stop
+info 141 node1/crm: server received shutdown request
+info 143 node2/lrm: got lock 'ha_agent_node2_lock'
+info 143 node2/lrm: status change wait_for_agent_lock => active
+info 143 node2/lrm: starting service ct:101
+info 143 node2/lrm: service status ct:101 started
+info 145 node3/lrm: got lock 'ha_agent_node3_lock'
+info 145 node3/lrm: status change wait_for_agent_lock => active
+info 145 node3/lrm: starting service ct:102
+info 145 node3/lrm: service status ct:102 started
+info 145 node3/lrm: starting service ct:103
+info 145 node3/lrm: service status ct:103 started
+info 145 node3/lrm: starting service ct:104
+info 145 node3/lrm: service status ct:104 started
+info 145 node3/lrm: starting service ct:105
+info 145 node3/lrm: service status ct:105 started
+info 145 node3/lrm: starting service ct:106
+info 145 node3/lrm: service status ct:106 started
+info 145 node3/lrm: starting service ct:107
+info 145 node3/lrm: service status ct:107 started
+info 145 node3/lrm: starting service ct:108
+info 145 node3/lrm: service status ct:108 started
+info 145 node3/lrm: starting service ct:109
+info 145 node3/lrm: service status ct:109 started
+info 160 node1/crm: voluntary release CRM lock
+info 161 node1/crm: exit (loop end)
+info 161 shutdown: execute power node1 off
+info 161 node2/crm: got lock 'ha_manager_lock'
+info 161 node2/crm: status change slave => master
+info 161 node2/crm: using scheduler mode 'static'
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static4/manager_status b/src/test/test-crs-static4/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static4/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static4/service_config b/src/test/test-crs-static4/service_config
new file mode 100644
index 0000000..b984a09
--- /dev/null
+++ b/src/test/test-crs-static4/service_config
@@ -0,0 +1,12 @@
+{
+ "ct:100": { "node": "node1", "state": "stopped" },
+ "ct:101": { "node": "node1", "state": "enabled" },
+ "ct:102": { "node": "node1", "state": "enabled" },
+ "ct:103": { "node": "node1", "state": "enabled" },
+ "ct:104": { "node": "node1", "state": "enabled" },
+ "ct:105": { "node": "node1", "state": "enabled" },
+ "ct:106": { "node": "node1", "state": "enabled" },
+ "ct:107": { "node": "node1", "state": "enabled" },
+ "ct:108": { "node": "node1", "state": "enabled" },
+ "ct:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static4/static_service_stats b/src/test/test-crs-static4/static_service_stats
new file mode 100644
index 0000000..878709b
--- /dev/null
+++ b/src/test/test-crs-static4/static_service_stats
@@ -0,0 +1,12 @@
+{
+ "ct:100": { "maxcpu": 2, "maxmem": 4000000000 },
+ "ct:101": { "maxcpu": 0, "maxmem": 40000000000 },
+ "ct:102": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:103": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:104": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:105": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:106": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:107": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:108": { "maxcpu": 2, "maxmem": 2000000000 },
+ "ct:109": { "maxcpu": 2, "maxmem": 2000000000 }
+}
diff --git a/src/test/test-crs-static5/README b/src/test/test-crs-static5/README
new file mode 100644
index 0000000..d9b5dc7
--- /dev/null
+++ b/src/test/test-crs-static5/README
@@ -0,0 +1,5 @@
+Test how recovery works with the 'static' resource scheduling mode.
+
+Expect that, when node1 is shut down, all services are migrated to node 3,
+because the services don't have much memory, node 2 and 3 both already have a
+service with high memory, but node 3 has much left-over CPU.
diff --git a/src/test/test-crs-static5/cmdlist b/src/test/test-crs-static5/cmdlist
new file mode 100644
index 0000000..e84297f
--- /dev/null
+++ b/src/test/test-crs-static5/cmdlist
@@ -0,0 +1,4 @@
+[
+ [ "power node1 on", "power node2 on", "power node3 on"],
+ [ "shutdown node1" ]
+]
diff --git a/src/test/test-crs-static5/datacenter.cfg b/src/test/test-crs-static5/datacenter.cfg
new file mode 100644
index 0000000..caa8148
--- /dev/null
+++ b/src/test/test-crs-static5/datacenter.cfg
@@ -0,0 +1,9 @@
+{
+ "crs": {
+ "ha": "static"
+ },
+ "ha": {
+ "shutdown_policy": "migrate"
+ }
+}
+
diff --git a/src/test/test-crs-static5/hardware_status b/src/test/test-crs-static5/hardware_status
new file mode 100644
index 0000000..3eb9e73
--- /dev/null
+++ b/src/test/test-crs-static5/hardware_status
@@ -0,0 +1,5 @@
+{
+ "node1": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 100000000000 },
+ "node3": { "power": "off", "network": "off", "cpus": 128, "memory": 100000000000 }
+}
diff --git a/src/test/test-crs-static5/log.expect b/src/test/test-crs-static5/log.expect
new file mode 100644
index 0000000..cb6b0d5
--- /dev/null
+++ b/src/test/test-crs-static5/log.expect
@@ -0,0 +1,117 @@
+info 0 hardware: starting simulation
+info 20 cmdlist: execute power node1 on
+info 20 node1/crm: status change startup => wait_for_quorum
+info 20 node1/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node2 on
+info 20 node2/crm: status change startup => wait_for_quorum
+info 20 node2/lrm: status change startup => wait_for_agent_lock
+info 20 cmdlist: execute power node3 on
+info 20 node3/crm: status change startup => wait_for_quorum
+info 20 node3/lrm: status change startup => wait_for_agent_lock
+info 20 node1/crm: got lock 'ha_manager_lock'
+info 20 node1/crm: status change wait_for_quorum => master
+info 20 node1/crm: using scheduler mode 'static'
+info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info 20 node1/crm: adding new service 'ct:102' on node 'node2'
+info 20 node1/crm: adding new service 'ct:103' on node 'node3'
+info 20 node1/crm: adding new service 'ct:104' on node 'node1'
+info 20 node1/crm: adding new service 'ct:105' on node 'node1'
+info 20 node1/crm: adding new service 'ct:106' on node 'node1'
+info 20 node1/crm: adding new service 'ct:107' on node 'node1'
+info 20 node1/crm: adding new service 'ct:108' on node 'node1'
+info 20 node1/crm: adding new service 'ct:109' on node 'node1'
+info 21 node1/lrm: got lock 'ha_agent_node1_lock'
+info 21 node1/lrm: status change wait_for_agent_lock => active
+info 21 node1/lrm: starting service ct:104
+info 21 node1/lrm: service status ct:104 started
+info 21 node1/lrm: starting service ct:105
+info 21 node1/lrm: service status ct:105 started
+info 21 node1/lrm: starting service ct:106
+info 21 node1/lrm: service status ct:106 started
+info 21 node1/lrm: starting service ct:107
+info 21 node1/lrm: service status ct:107 started
+info 21 node1/lrm: starting service ct:108
+info 21 node1/lrm: service status ct:108 started
+info 21 node1/lrm: starting service ct:109
+info 21 node1/lrm: service status ct:109 started
+info 22 node2/crm: status change wait_for_quorum => slave
+info 23 node2/lrm: got lock 'ha_agent_node2_lock'
+info 23 node2/lrm: status change wait_for_agent_lock => active
+info 23 node2/lrm: starting service ct:102
+info 23 node2/lrm: service status ct:102 started
+info 24 node3/crm: status change wait_for_quorum => slave
+info 25 node3/lrm: got lock 'ha_agent_node3_lock'
+info 25 node3/lrm: status change wait_for_agent_lock => active
+info 25 node3/lrm: starting service ct:103
+info 25 node3/lrm: service status ct:103 started
+info 120 cmdlist: execute shutdown node1
+info 120 node1/lrm: got shutdown request with shutdown policy 'migrate'
+info 120 node1/lrm: shutdown LRM, doing maintenance, removing this node from active list
+info 120 node1/crm: node 'node1': state changed from 'online' => 'maintenance'
+info 120 node1/crm: relocate service 'ct:104' to node 'node3'
+info 120 node1/crm: service 'ct:104': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:105' to node 'node3'
+info 120 node1/crm: service 'ct:105': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:106' to node 'node3'
+info 120 node1/crm: service 'ct:106': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:107' to node 'node3'
+info 120 node1/crm: service 'ct:107': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:108' to node 'node3'
+info 120 node1/crm: service 'ct:108': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 120 node1/crm: relocate service 'ct:109' to node 'node3'
+info 120 node1/crm: service 'ct:109': state changed from 'started' to 'relocate' (node = node1, target = node3)
+info 121 node1/lrm: status change active => maintenance
+info 121 node1/lrm: service ct:104 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:104 (relocate)
+info 121 node1/lrm: service status ct:104 stopped
+info 121 node1/lrm: service ct:104 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:105 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:105 (relocate)
+info 121 node1/lrm: service status ct:105 stopped
+info 121 node1/lrm: service ct:105 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:106 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:106 (relocate)
+info 121 node1/lrm: service status ct:106 stopped
+info 121 node1/lrm: service ct:106 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:107 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:107 (relocate)
+info 121 node1/lrm: service status ct:107 stopped
+info 121 node1/lrm: service ct:107 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:108 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:108 (relocate)
+info 121 node1/lrm: service status ct:108 stopped
+info 121 node1/lrm: service ct:108 - end relocate to node 'node3'
+info 121 node1/lrm: service ct:109 - start relocate to node 'node3'
+info 121 node1/lrm: stopping service ct:109 (relocate)
+info 121 node1/lrm: service status ct:109 stopped
+info 121 node1/lrm: service ct:109 - end relocate to node 'node3'
+info 140 node1/crm: service 'ct:104': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:105': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:106': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:107': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:108': state changed from 'relocate' to 'started' (node = node3)
+info 140 node1/crm: service 'ct:109': state changed from 'relocate' to 'started' (node = node3)
+info 142 node1/lrm: exit (loop end)
+info 142 shutdown: execute crm node1 stop
+info 141 node1/crm: server received shutdown request
+info 145 node3/lrm: starting service ct:104
+info 145 node3/lrm: service status ct:104 started
+info 145 node3/lrm: starting service ct:105
+info 145 node3/lrm: service status ct:105 started
+info 145 node3/lrm: starting service ct:106
+info 145 node3/lrm: service status ct:106 started
+info 145 node3/lrm: starting service ct:107
+info 145 node3/lrm: service status ct:107 started
+info 145 node3/lrm: starting service ct:108
+info 145 node3/lrm: service status ct:108 started
+info 145 node3/lrm: starting service ct:109
+info 145 node3/lrm: service status ct:109 started
+info 160 node1/crm: voluntary release CRM lock
+info 161 node1/crm: exit (loop end)
+info 161 shutdown: execute power node1 off
+info 161 node2/crm: got lock 'ha_manager_lock'
+info 161 node2/crm: status change slave => master
+info 161 node2/crm: using scheduler mode 'static'
+info 720 hardware: exit simulation - done
diff --git a/src/test/test-crs-static5/manager_status b/src/test/test-crs-static5/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static5/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static5/service_config b/src/test/test-crs-static5/service_config
new file mode 100644
index 0000000..43c5f60
--- /dev/null
+++ b/src/test/test-crs-static5/service_config
@@ -0,0 +1,10 @@
+{
+ "ct:102": { "node": "node2", "state": "enabled" },
+ "ct:103": { "node": "node3", "state": "enabled" },
+ "ct:104": { "node": "node1", "state": "enabled" },
+ "ct:105": { "node": "node1", "state": "enabled" },
+ "ct:106": { "node": "node1", "state": "enabled" },
+ "ct:107": { "node": "node1", "state": "enabled" },
+ "ct:108": { "node": "node1", "state": "enabled" },
+ "ct:109": { "node": "node1", "state": "enabled" }
+}
diff --git a/src/test/test-crs-static5/static_service_stats b/src/test/test-crs-static5/static_service_stats
new file mode 100644
index 0000000..6293f63
--- /dev/null
+++ b/src/test/test-crs-static5/static_service_stats
@@ -0,0 +1,11 @@
+{
+ "ct:101": { "maxcpu": 0, "maxmem": 40000000000 },
+ "ct:102": { "maxcpu": 0.5, "maxmem": 40000000000 },
+ "ct:103": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:104": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:105": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:106": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:107": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:108": { "maxcpu": 0.5, "maxmem": 200000000 },
+ "ct:109": { "maxcpu": 0.5, "maxmem": 200000000 }
+}
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (13 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-18 7:48 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
` (2 subsequent siblings)
17 siblings, 1 reply; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
src/PVE/HA/Resources/PVECT.pm | 2 ++
src/PVE/HA/Resources/PVEVM.pm | 2 ++
2 files changed, 4 insertions(+)
diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
index 4c9530d..e77d98c 100644
--- a/src/PVE/HA/Resources/PVECT.pm
+++ b/src/PVE/HA/Resources/PVECT.pm
@@ -3,6 +3,8 @@ package PVE::HA::Resources::PVECT;
use strict;
use warnings;
+use PVE::Cluster;
+
use PVE::HA::Tools;
BEGIN {
diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
index 49e4a1d..f405d86 100644
--- a/src/PVE/HA/Resources/PVEVM.pm
+++ b/src/PVE/HA/Resources/PVEVM.pm
@@ -3,6 +3,8 @@ package PVE::HA::Resources::PVEVM;
use strict;
use warnings;
+use PVE::Cluster;
+
use PVE::HA::Tools;
BEGIN {
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (14 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
briefly describing the 'basic' and 'static' modes and with a note
mentioning plans for balancers.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Changes from v1:
* Mention that it also affects shutdown policy migrations.
* Describe static mode in more detail.
ha-manager.adoc | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 54db2a5..038193f 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -933,6 +933,51 @@ NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or
immediate node reboot or even reset.
+Scheduler Mode
+--------------
+
+The scheduler mode controls how HA selects nodes for the recovery of a service
+as well as for migrations that are triggered by a shutdown policy. The default
+mode is `basic`, you can change it in `datacenter.cfg`:
+
+----
+crs: ha=static
+----
+
+The change will be in effect when a new master takes over. This can be triggered
+by executing the following on the current master's node:
+
+----
+systemctl reload-or-restart pve-ha-crm.service
+----
+
+For each service that needs to be recovered or migrated, the scheduler
+iteratively chooses the best node among the nodes with the highest priority in
+the service's group.
+
+NOTE: There are plans to add modes for (static and dynamic) load-balancing in
+the future.
+
+Basic
+^^^^^
+
+The number of active HA serivces on each node is used to choose a recovery node.
+
+Static
+^^^^^^
+
+Static usage information from HA serivces on each node is used to choose a
+recovery node.
+
+For this selection, each node in turn is considered as if the service was
+already running on it, using CPU and memory usage from the associated guest
+configuration. Then for each such alternative, CPU and memory usage of all nodes
+are considered, with memory being weighted much more, because it's a truly
+limited resource. For both, CPU and memory, highest usage among nodes (weighted
+more, as ideally no node should be overcommitted) and average usage of all nodes
+(to still be able to distinguish in case there already is a more highly
+committed node) are considered.
+
ifdef::manvolnum[]
include::pve-copyright.adoc[]
endif::manvolnum[]
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (15 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
@ 2022-11-17 14:00 ` Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht
17 siblings, 0 replies; 21+ messages in thread
From: Fiona Ebner @ 2022-11-17 14:00 UTC (permalink / raw)
To: pve-devel
In HA manager, the function recompute_online_node_usage() is called
very often currently and the 'static' mode needs to read the guest
configs which adds a bit of overhead.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
New in v2.
ha-manager.adoc | 3 +++
1 file changed, 3 insertions(+)
diff --git a/ha-manager.adoc b/ha-manager.adoc
index 038193f..710cbca 100644
--- a/ha-manager.adoc
+++ b/ha-manager.adoc
@@ -966,6 +966,9 @@ The number of active HA serivces on each node is used to choose a recovery node.
Static
^^^^^^
+WARNING: The static mode is still a technology preview. It is not recommended to
+use it if you have thousands of HA managed services.
+
Static usage information from HA serivces on each node is used to choose a
recovery node.
--
2.30.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
@ 2022-11-18 7:48 ` Fiona Ebner
2022-11-18 12:48 ` Thomas Lamprecht
0 siblings, 1 reply; 21+ messages in thread
From: Fiona Ebner @ 2022-11-18 7:48 UTC (permalink / raw)
To: pve-devel
Am 17.11.22 um 15:00 schrieb Fiona Ebner:
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> New in v2.
>
> src/PVE/HA/Resources/PVECT.pm | 2 ++
> src/PVE/HA/Resources/PVEVM.pm | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/src/PVE/HA/Resources/PVECT.pm b/src/PVE/HA/Resources/PVECT.pm
> index 4c9530d..e77d98c 100644
> --- a/src/PVE/HA/Resources/PVECT.pm
> +++ b/src/PVE/HA/Resources/PVECT.pm
> @@ -3,6 +3,8 @@ package PVE::HA::Resources::PVECT;
> use strict;
> use warnings;
>
> +use PVE::Cluster;
> +
> use PVE::HA::Tools;
>
> BEGIN {
Might be better added to the BEGIN block here, and not pull it in for
doc generation in the spirit of a1c8862 ("buildsys: don't pull qemu/lxc
during doc-generation")
> diff --git a/src/PVE/HA/Resources/PVEVM.pm b/src/PVE/HA/Resources/PVEVM.pm
> index 49e4a1d..f405d86 100644
> --- a/src/PVE/HA/Resources/PVEVM.pm
> +++ b/src/PVE/HA/Resources/PVEVM.pm
> @@ -3,6 +3,8 @@ package PVE::HA::Resources::PVEVM;
> use strict;
> use warnings;
>
> +use PVE::Cluster;
> +
> use PVE::HA::Tools;
>
> BEGIN {
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements
2022-11-18 7:48 ` Fiona Ebner
@ 2022-11-18 12:48 ` Thomas Lamprecht
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Lamprecht @ 2022-11-18 12:48 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
Am 18/11/2022 um 08:48 schrieb Fiona Ebner:
>> +use PVE::Cluster;
>> +
>> use PVE::HA::Tools;
>>
>> BEGIN {
> Might be better added to the BEGIN block here, and not pull it in for
> doc generation in the spirit of a1c8862 ("buildsys: don't pull qemu/lxc
> during doc-generation")
>
Not relevant, we only do that for pve-container & qemu-server dependencies,
as those have a cyclic dependency with pve-ha-manager; so only for those we
guard to make bootstrapping easier.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
` (16 preceding siblings ...)
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
@ 2022-11-18 13:23 ` Thomas Lamprecht
17 siblings, 0 replies; 21+ messages in thread
From: Thomas Lamprecht @ 2022-11-18 13:23 UTC (permalink / raw)
To: Proxmox VE development discussion, Fiona Ebner
Am 17/11/2022 um 15:00 schrieb Fiona Ebner:
> ha-manager:
>
> Fiona Ebner (15):
> env: add get_static_node_stats() method
> resources: add get_static_stats() method
> add Usage base plugin and Usage::Basic plugin
> manager: select service node: add $sid to parameters
> manager: online node usage: switch to Usage::Basic plugin
> usage: add Usage::Static plugin
> env: rename get_ha_settings to get_datacenter_settings
> env: datacenter config: include crs (cluster-resource-scheduling)
> setting
> manager: set resource scheduler mode upon init
> manager: use static resource scheduler when configured
> manager: avoid scoring nodes if maintenance fallback node is valid
> manager: avoid scoring nodes when not trying next and current node is
> valid
> usage: static: use service count on nodes as a fallback
> test: add tests for static resource scheduling
> resources: add missing PVE::Cluster use statements
>
>
> docs:
>
> Fiona Ebner (2):
> ha: add section about scheduler modes
> ha: add warning against using 'static' mode with many services
>
> ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
nice work! applied series, thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-11-18 13:23 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
2022-11-18 7:48 ` Fiona Ebner
2022-11-18 12:48 ` Thomas Lamprecht
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox