From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id C5DF21FF137 for ; Tue, 31 Mar 2026 15:50:23 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id EF9E11E546; Tue, 31 Mar 2026 15:50:49 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 31 Mar 2026 15:50:14 +0200 Message-Id: From: "Daniel Kral" To: "Daniel Kral" , Subject: Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260330144101.668747-1-d.kral@proxmox.com> <20260330144101.668747-36-d.kral@proxmox.com> In-Reply-To: <20260330144101.668747-36-d.kral@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774964958998 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.426 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: 2BBU2REGWTKPZKVAWTKYCNCXV3KKPZ7R X-Message-ID-Hash: 2BBU2REGWTKPZKVAWTKYCNCXV3KKPZ7R X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote: > +sub load_balance { > + my ($self) =3D @_; > + > + my ($crs, $haenv, $online_node_usage) =3D $self->@{qw(crs haenv onli= ne_node_usage)}; > + my ($auto_rebalance_opts) =3D $crs->{auto_rebalance}; > + > + return if !$auto_rebalance_opts->{enable}; > + return if $crs->{scheduler} ne 'static' && $crs->{scheduler} ne 'dyn= amic'; We do not implement the load balancing related methods for PVE::HA::Usage::Basic. If for some reason recompute_online_node_usage() fallbacks to use PVE::HA::Usage::Basic instead of the selected 'static' or 'dynamic' crs mode, then this guarantee here is wrong and the first call (here $online_node_usage->calculate_node_imbalance()) will fail. recompute_online_node_usage() could easily fail e.g. if for 'dynamic' one node does not have pvestatd running. PVE::HA::Usage::Dynamic::add_node() will fail then, because there's no recent node usage data for all nodes to correctly represent the cluster node usage. I'll fix this either by changing the $self->{crs}->{mode} value as we fallback or at least having sensible implementations for Basic as well, such as 0.0 for node_imbalance and empty lists for the score_best_balancing_migrations{,_topsis}() methods, or even both. > + return if $self->any_resource_motion_queued_or_running(); > + > + my ($threshold, $method, $hold_duration, $margin) =3D > + $auto_rebalance_opts->@{qw(threshold method hold_duration margin= )}; > + > + my $imbalance =3D $online_node_usage->calculate_node_imbalance(); > + > + # do not load balance unless imbalance threshold has been exceeded > + # consecutively for $hold_duration calls to load_balance() > + if ($imbalance < $threshold) { > + $self->{sustained_imbalance_round} =3D 0; > + return; > + } else { > + $self->{sustained_imbalance_round}++; > + return if $self->{sustained_imbalance_round} < $hold_duration; > + $self->{sustained_imbalance_round} =3D 0; > + } > + > + my $candidates =3D $self->get_resource_migration_candidates(); > + > + my $result; > + if ($method eq 'bruteforce') { > + $result =3D $online_node_usage->select_best_balancing_migration(= $candidates); > + } elsif ($method eq 'topsis') { > + $result =3D $online_node_usage->select_best_balancing_migration_= topsis($candidates); > + } > + > + # happens if $candidates is empty or $method isn't handled above > + return if !$result; > + > + my ($migration, $target_imbalance) =3D $result->@{qw(migration imbal= ance)}; > + > + my $relative_change =3D ($imbalance - $target_imbalance) / $imbalanc= e; > + return if $relative_change < $margin; > + > + my ($sid, $source, $target) =3D $migration->@{qw(sid source-node tar= get-node)}; > + > + my (undef, $type, $id) =3D $haenv->parse_sid($sid); > + my $task =3D $type eq 'vm' ? "migrate" : "relocate"; > + my $cmd =3D "$task $sid $target"; > + > + my $target_imbalance_str =3D int(100 * $target_imbalance + 0.5) / 10= 0; > + $haenv->log( > + 'info', > + "auto rebalance - $task $sid to $target (expected target imbalan= ce: $target_imbalance_str)", > + ); > + > + $self->queue_resource_motion($cmd, $task, $sid, $target); > +}