From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id F01871FF16E
	for <inbox@lore.proxmox.com>; Mon, 28 Apr 2025 15:20:50 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 26EC131827;
	Mon, 28 Apr 2025 15:20:59 +0200 (CEST)
Message-ID: <28ca2817-2a17-4b67-b245-2b40462b776a@proxmox.com>
Date: Mon, 28 Apr 2025 15:20:24 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Daniel Kral <d.kral@proxmox.com>
References: <20250325151254.193177-1-d.kral@proxmox.com>
 <20250325151254.193177-12-d.kral@proxmox.com>
Content-Language: en-US
From: Fiona Ebner <f.ebner@proxmox.com>
In-Reply-To: <20250325151254.193177-12-d.kral@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.037 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] [PATCH ha-manager 10/15] sim: resources: add option
 to limit start and migrate tries to node
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-devel-bounces@lists.proxmox.com
Sender: "pve-devel" <pve-devel-bounces@lists.proxmox.com>

Am 25.03.25 um 16:12 schrieb Daniel Kral:
> Add an option to the VirtFail's name to allow the start and migrate fail
> counts to only apply on a certain node number with a specific naming
> scheme.
> 
> This allows a slightly more elaborate test type, e.g. where a service
> can start on one node (or any other in that case), but fails to start on
> a specific node, which it is expected to start on after a migration.
> 
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>

With some nits:

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>

> ---
>  src/PVE/HA/Sim/Resources/VirtFail.pm | 37 +++++++++++++++++++---------
>  1 file changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/src/PVE/HA/Sim/Resources/VirtFail.pm b/src/PVE/HA/Sim/Resources/VirtFail.pm
> index ce88391..fddecd6 100644
> --- a/src/PVE/HA/Sim/Resources/VirtFail.pm
> +++ b/src/PVE/HA/Sim/Resources/VirtFail.pm
> @@ -10,25 +10,36 @@ use base qw(PVE::HA::Sim::Resources);
>  # To make it more interesting we can encode some behavior in the VMID
>  # with the following format, where fa: is the type and a, b, c, ...
>  # are digits in base 10, i.e. the full service ID would be:
> -#   fa:abcde
> +#   fa:abcdef
>  # And the digits after the fa: type prefix would mean:
>  #   - a: no meaning but can be used for differentiating similar resources
>  #   - b: how many tries are needed to start correctly (0 is normal behavior) (should be set)
>  #   - c: how many tries are needed to migrate correctly (0 is normal behavior) (should be set)
>  #   - d: should shutdown be successful (0 = yes, anything else no) (optional)
>  #   - e: return value of $plugin->exists() defaults to 1 if not set (optional)
> +#   - f: limits the constraints of b and c to the nodeX (0 = apply to all nodes) (optional)

Requires us to have exactly this kind of node name for such tests, but
can be fine IMHO.

>  
>  my $decode_id = sub {
>      my $id = shift;
>  
> -    my ($start, $migrate, $stop, $exists) = $id =~ /^\d(\d)(\d)(\d)?(\d)?/g;
> +    my ($start, $migrate, $stop, $exists, $limit_to_node) = $id =~ /^\d(\d)(\d)(\d)?(\d)?(\d)?/g;
>  
>      $start = 0 if !defined($start);
>      $migrate = 0 if !defined($migrate);
>      $stop = 0 if !defined($stop);
>      $exists = 1 if !defined($exists);
> +    $limit_to_node = 0 if !defined($limit_to_node);
>  
> -    return ($start, $migrate, $stop, $exists)
> +    return ($start, $migrate, $stop, $exists, $limit_to_node);
> +};
> +
> +my $should_retry_action = sub {

"action" feels a bit too general to me. It does not apply to all
actions. Also it determines whether the action itself should fail.
Retrying is then just the consequence.

> +    my ($haenv, $limit_to_node) = @_;
> +
> +    my ($node) = $haenv->nodename() =~ /^node(\d)/g;

No need for a regex, you could just check $limit_to_node == 0 early and
then compare with the exactly known value.

> +    $node = 0 if !defined($node);
> +
> +    return $limit_to_node == 0 || $limit_to_node == $node;
>  };
>  
>  my $tries = {
> @@ -53,12 +64,14 @@ sub exists {
>  sub start {
>      my ($class, $haenv, $id) = @_;
>  
> -    my ($start_failure_count) = &$decode_id($id);
> +    my ($start_failure_count, $limit_to_node) = (&$decode_id($id))[0,4];

Style nit: pre-existing, but you can go for $decode_id->()

>  
> -    $tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
> -    $tries->{start}->{$id}++;
> +    if ($should_retry_action->($haenv, $limit_to_node)) {
> +	$tries->{start}->{$id} = 0 if !$tries->{start}->{$id};
> +	$tries->{start}->{$id}++;
>  
> -    return if $start_failure_count >= $tries->{start}->{$id};
> +	return if $start_failure_count >= $tries->{start}->{$id};
> +    }
>  
>      $tries->{start}->{$id} = 0; # reset counts
>  
> @@ -79,12 +92,14 @@ sub shutdown {
>  sub migrate {
>      my ($class, $haenv, $id, $target, $online) = @_;
>  
> -    my (undef, $migrate_failure_count) = &$decode_id($id);
> +    my ($migrate_failure_count, $limit_to_node) = (&$decode_id($id))[1,4];

Same as above

>  
> -    $tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
> -    $tries->{migrate}->{$id}++;
> +    if ($should_retry_action->($haenv, $limit_to_node)) {
> +	$tries->{migrate}->{$id} = 0 if !$tries->{migrate}->{$id};
> +	$tries->{migrate}->{$id}++;
>  
> -    return if $migrate_failure_count >= $tries->{migrate}->{$id};
> +	return if $migrate_failure_count >= $tries->{migrate}->{$id};
> +    }
>  
>      $tries->{migrate}->{$id} = 0; # reset counts
>  



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel