From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 2474E1FF185
	for <inbox@lore.proxmox.com>; Mon,  4 Aug 2025 17:23:09 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 6FC6535798;
	Mon,  4 Aug 2025 17:24:37 +0200 (CEST)
Mime-Version: 1.0
Date: Mon, 04 Aug 2025 17:24:33 +0200
Message-Id: <DBTQX35YOW5G.32B0K9EVNOLV7@proxmox.com>
From: "Hannes Duerr" <h.duerr@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>
Cc: "pve-devel" <pve-devel-bounces@lists.proxmox.com>
X-Mailer: aerc 0.20.0
References: <20250804141204.207216-1-d.kral@proxmox.com>
 <20250804141204.207216-5-d.kral@proxmox.com>
In-Reply-To: <20250804141204.207216-5-d.kral@proxmox.com>
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1754321054818
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.073 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH docs 5/5] ha: replace in-text references to
 ha groups with ha rules
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

On Mon Aug 4, 2025 at 4:11 PM CEST, Daniel Kral wrote:
> As HA groups are replaced by HA node affinity rules and user can
> implement new CRS behavior with HA resource affinity rules now, update
> texts that reference HA groups with references to HA rules instead.
>
> While at it, also replace references to "HA services" with "HA
> resources" for short sections that are touched in the process as new
> references should use the latter term only.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
>  ha-manager.adoc | 49 +++++++++++++++++++++++++------------------------
>  1 file changed, 25 insertions(+), 24 deletions(-)
>
> diff --git a/ha-manager.adoc b/ha-manager.adoc
> index ffab83c..f63fd05 100644
> --- a/ha-manager.adoc
> +++ b/ha-manager.adoc
> @@ -314,9 +314,8 @@ recovery state.
>  recovery::
>  
>  Wait for recovery of the service. The HA manager tries to find a new node where
forgot to change the `service` here?
> -the service can run on. This search depends not only on the list of online and
> -quorate nodes, but also if the service is a group member and how such a group
> -is limited.
> +the service can run on. This search depends on the list of online and quorate
s/service/resource/
> +nodes as well as the affinity rules the service is part of, if any.
s/service/resource/
>  As soon as a new available node is found, the service will be moved there and
forgot to change the `service` here?
>  initially placed into stopped state. If it's configured to run the new node
>  will do so.
> @@ -977,20 +976,24 @@ Recover Fenced Services
>  ~~~~~~~~~~~~~~~~~~~~~~~
>  
>  After a node failed and its fencing was successful, the CRM tries to
> -move services from the failed node to nodes which are still online.
> +move HA resources from the failed node to nodes which are still online.
>  
> -The selection of nodes, on which those services gets recovered, is
> -influenced by the resource `group` settings, the list of currently active
> -nodes, and their respective active service count.
> +The selection of the recovery nodes is influenced by the list of
> +currently active nodes, their respective loads depending on the used
> +scheduler, and the affinity rules the resource is part of, if any.
>  
> -The CRM first builds a set out of the intersection between user selected
> -nodes (from `group` setting) and available nodes. It then choose the
> -subset of nodes with the highest priority, and finally select the node
> -with the lowest active service count. This minimizes the possibility
> +First, the CRM builds a set of nodes available to the HA resource. If the
> +resource is part of a node affinity rule, the set is reduced to the
> +highest priority nodes in the node affinity rule. If the resource is part
> +of a resource affinity rule, the set is further reduced to fufill their
> +constraints, which is either keeping the HA resource on the same node as
> +some other HA resources or keeping the HA resource on a different node
> +than some other HA resources. Finally, the CRM selects the node with the
> +lowest load according to the used scheduler to minimize the possibility
>  of an overloaded node.
>  
> -CAUTION: On node failure, the CRM distributes services to the
> -remaining nodes. This increases the service count on those nodes, and
> +CAUTION: On node failure, the CRM distributes resources to the
> +remaining nodes. This increases the resource count on those nodes, and
>  can lead to high load, especially on small clusters. Please design
>  your cluster so that it can handle such worst case scenarios.
>  
> @@ -1102,7 +1105,7 @@ You can use the manual maintenance mode to mark the node as unavailable for HA
>  operation, prompting all services managed by HA to migrate to other nodes.
forgot to change the `service` here?
>  
>  The target nodes for these migrations are selected from the other currently
> -available nodes, and determined by the HA group configuration and the configured
> +available nodes, and determined by the HA rules configuration and the configured
>  cluster resource scheduler (CRS) mode.
>  During each migration, the original node will be recorded in the HA managers'
>  state, so that the service can be moved back again automatically once the
forgot to change the `service` here?
> @@ -1173,14 +1176,12 @@ This triggers a migration of all HA Services currently located on this node.
forgot to change the `service` here?
>  The LRM will try to delay the shutdown process, until all running services get
forgot to change the `service` here?
>  moved away. But, this expects that the running services *can* be migrated to
forgot to change the `service` here?
>  another node. In other words, the service must not be locally bound, for example
forgot to change the `service` here?
> -by using hardware passthrough. As non-group member nodes are considered as
> -runnable target if no group member is available, this policy can still be used
> -when making use of HA groups with only some nodes selected. But, marking a group
> -as 'restricted' tells the HA manager that the service cannot run outside of the
> -chosen set of nodes. If all of those nodes are unavailable, the shutdown will
> -hang until you manually intervene. Once the shut down node comes back online
> -again, the previously displaced services will be moved back, if they were not
> -already manually migrated in-between.
> +by using hardware passthrough. For example, strict node affinity rules tell the
s/For example, s/S/
> +HA Manager that the service cannot run outside of the chosen set of nodes. If all
> +of those nodes are unavailable, the shutdown will hang until you manually
s/those/these/
> +intervene. Once the shut down node comes back online again, the previously
> +displaced services will be moved back, if they were not already manually migrated
> +in-between.
>  
>  NOTE: The watchdog is still active during the migration process on shutdown.
>  If the node loses quorum it will be fenced and the services will be recovered.
> @@ -1266,8 +1267,8 @@ The change will be in effect starting with the next manager round (after a few
>  seconds).
>  
>  For each service that needs to be recovered or migrated, the scheduler
> -iteratively chooses the best node among the nodes with the highest priority in
> -the service's group.
> +iteratively chooses the best node among the nodes that are available to
> +the service according to their HA rules, if any.
Doesn't the scheduler take the ha node affinity priority into
consideration here?
And:
s/service/resource/
>  
>  NOTE: There are plans to add modes for (static and dynamic) load-balancing in
>  the future.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel