From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id A51C41FF13F for ; Thu, 09 Apr 2026 13:43:21 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id F3EAC2475; Thu, 9 Apr 2026 13:43:05 +0200 (CEST) From: Daniel Kral To: pve-devel@lists.proxmox.com Subject: [PATCH docs 18/18] ha-manager: crs: add load balancer section Date: Thu, 9 Apr 2026 13:41:44 +0200 Message-ID: <20260409114224.323102-19-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260409114224.323102-1-d.kral@proxmox.com> References: <20260409114224.323102-1-d.kral@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1775734880454 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.081 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 6GECBZUHAAWICF2PU3PRUIC6MOPA6TGE X-Message-ID-Hash: 6GECBZUHAAWICF2PU3PRUIC6MOPA6TGE X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: For pve-ha-manager >= 5.2.0, the HA Manager features an automatic rebalancing system, which is available for the static and dynamic-load scheduler mode. Signed-off-by: Daniel Kral --- ha-manager.adoc | 51 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/ha-manager.adoc b/ha-manager.adoc index 4162071..0ba00cd 100644 --- a/ha-manager.adoc +++ b/ha-manager.adoc @@ -1435,9 +1435,6 @@ scheduling decisions. The CRS mode can be changed in the web interface is `basic`. The change will take effect starting with the next HA Manager round, which should take no longer than 10 seconds. -NOTE: There are plans to add modes for (static and dynamic) load-balancing in -the future. - [[_basic_scheduler]] Basic Scheduler ^^^^^^^^^^^^^^^ @@ -1526,6 +1523,54 @@ functionality is still in technology preview. The more HA resources the more possible combinations there are, so it's currently not recommended to use it if you have thousands of HA resources. +CRS Load Balancer +~~~~~~~~~~~~~~~~~ + +IMPORTANT: This feature is still in technology preview. + +The utilization of individual cluster nodes can vary greatly depending where +guests are currently running and how much load these exert at any moment. This +can cause the node utilizations to become imbalanced over time, where HA +resources on one node become bottlenecked due to resource contention, while +another node is barely utilized. + +The HA Manager's automatic load balancer can lower the overall cluster +imbalance by automatically issuing rebalancing migrations. Currently, these +migrations are issued only for HA resources and executed sequentially. +Furthermore, these migrations always obey the current HA affinity rules. + +The automatic load balancing system can be enabled and configured in the web +interface under `Datacenter` -> `Options` -> `Cluster Resource Scheduling`. +This allows fine-tuning the various parameters as described in the next +paragraph for different cluster setups and requirements. The load balancer +requires either the xref:_static_scheduler[static] or +xref:_dynamic_scheduler[dynamic-load scheduler mode] to be in use. + +In every HA Manager round, where each lasts around 10 seconds, the automatic +load balancer checks whether the cluster imbalance exceeds the _imbalance +threshold_. If exceeding the imbalance threshold is sustained for at least as +many consecutive HA Manager rounds as given by the _hold duration_, then the +load balancer will select the rebalancing migration for an HA resource, which +lowers the overall cluster imbalance the most. The method used to select the +best rebalancing migration can be set with the _rebalancing method_ option. At +last, if the selected migration reduces the current cluster imbalance by at +least the percentage given by the _imbalance improvement_ option, the load +balancer issues the selected migration. + +The cluster imbalance is calculated as the ratio between the standard deviation +and mean of the individual node loads. Therefore, the _imbalance threshold_ +controls how sensitive the load balancer should be triggered. If the +_imbalance threshold_ is set to a value of `0.0`, it will try to select a +rebalancing migration every time the _hold duration_ is exceeded. +The _imbalance improvement_ percentage filters what constitutes as an +acceptable rebalancing migration. If it is set to `0.0`, then it will always +commit to a migration even though the expected imbalance does not change at +all. However, the value can never be negative, therefore it will never commit +to a migration with an expected imbalance worse than the current one. + +NOTE: It is planned that load balancing will also include non-HA resources in +the future. + ifdef::manvolnum[] include::pve-copyright.adoc[] endif::manvolnum[] -- 2.47.3