public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured
Date: Thu, 17 Nov 2022 15:00:11 +0100	[thread overview]
Message-ID: <20221117140018.105004-11-f.ebner@proxmox.com> (raw)
In-Reply-To: <20221117140018.105004-1-f.ebner@proxmox.com>

Note that recompute_online_node_usage() becomes much slower when the
'static' resource scheduler mode is used. Tested it with ~300 HA
services (minimal containers) running on my virtual test cluster.

Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds

Combined with the fact that recompute_online_node_usage() is currently
called very often this can lead to a lot of delay during recovery
situations with hundreds of services and low thousands of services
overall and with genereous estimates even run into the watchdog timer.

Ideas to remedy this is using PVE::Cluster's
get_guest_config_properties() instead of load_config() and/or
optimizing how often recompute_online_node_usage() is called.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Add fixme note about overhead.
    * Add benchmark results to commit message.

 src/PVE/HA/Manager.pm | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 1638442..7f1d1d7 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -8,6 +8,7 @@ use PVE::Tools;
 use PVE::HA::Tools ':exit_codes';
 use PVE::HA::NodeStatus;
 use PVE::HA::Usage::Basic;
+use PVE::HA::Usage::Static;
 
 ## Variable Name & Abbreviations Convention
 #
@@ -203,14 +204,35 @@ my $valid_service_states = {
     error => 1,
 };
 
+# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact
+# that this function is called for each state change and upon recovery doesn't help.
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv});
+    my $haenv = $self->{haenv};
 
     my $online_nodes = $self->{ns}->list_online_nodes();
 
-    $online_node_usage->add_node($_) for $online_nodes->@*;
+    my $online_node_usage;
+
+    if (my $mode = $self->{'scheduler-mode'}) {
+	if ($mode eq 'static') {
+	    $online_node_usage = eval {
+		my $scheduler = PVE::HA::Usage::Static->new($haenv);
+		$scheduler->add_node($_) for $online_nodes->@*;
+		return $scheduler;
+	    };
+	    $haenv->log('warning', "using 'basic' scheduler mode, init for 'static' failed - $@")
+		if $@;
+	} elsif ($mode ne 'basic') {
+	    $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'");
+	}
+    }
+
+    if (!$online_node_usage) {
+	$online_node_usage = PVE::HA::Usage::Basic->new($haenv);
+	$online_node_usage->add_node($_) for $online_nodes->@*;
+    }
 
     foreach my $sid (keys %{$self->{ss}}) {
 	my $sd = $self->{ss}->{$sid};
-- 
2.30.2





  parent reply	other threads:[~2022-11-17 14:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17 14:00 [pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 01/15] env: add get_static_node_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 02/15] resources: add get_static_stats() method Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 03/15] add Usage base plugin and Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 04/15] manager: select service node: add $sid to parameters Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 05/15] manager: online node usage: switch to Usage::Basic plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 06/15] usage: add Usage::Static plugin Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 07/15] env: rename get_ha_settings to get_datacenter_settings Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 08/15] env: datacenter config: include crs (cluster-resource-scheduling) setting Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 09/15] manager: set resource scheduler mode upon init Fiona Ebner
2022-11-17 14:00 ` Fiona Ebner [this message]
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 11/15] manager: avoid scoring nodes if maintenance fallback node is valid Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 12/15] manager: avoid scoring nodes when not trying next and current " Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 13/15] usage: static: use service count on nodes as a fallback Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 14/15] test: add tests for static resource scheduling Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 ha-manager 15/15] resources: add missing PVE::Cluster use statements Fiona Ebner
2022-11-18  7:48   ` Fiona Ebner
2022-11-18 12:48     ` Thomas Lamprecht
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 1/2] ha: add section about scheduler modes Fiona Ebner
2022-11-17 14:00 ` [pve-devel] [PATCH v2 docs 2/2] ha: add warning against using 'static' mode with many services Fiona Ebner
2022-11-18 13:23 ` [pve-devel] applied-series: [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221117140018.105004-11-f.ebner@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal