From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id E38321FF137 for ; Tue, 31 Mar 2026 11:32:38 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 08B1815C9E; Tue, 31 Mar 2026 11:33:05 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 31 Mar 2026 11:32:59 +0200 Message-Id: Subject: Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing From: "Daniel Kral" To: "Dominik Rusovac" , =?utf-8?q?Michael_K=C3=B6ppl?= , X-Mailer: aerc 0.21.0-38-g7088c3642f2c-dirty References: <20260330144101.668747-1-d.kral@proxmox.com> <20260330144101.668747-36-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774949523913 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.932 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment LONGWORDS 1 Long string of long words RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: YJNXIZ4POUQTSMTJMYB7JIDVAG3OAOW4 X-Message-ID-Hash: YJNXIZ4POUQTSMTJMYB7JIDVAG3OAOW4 X-MailFrom: d.kral@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue Mar 31, 2026 at 11:16 AM CEST, Dominik Rusovac wrote: > On Tue Mar 31, 2026 at 11:07 AM CEST, Michael K=C3=B6ppl wrote: >> 2 comments inline >> >> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote: >> >> [snip] >> >>> + my $candidates =3D $self->get_resource_migration_candidates(); >>> + >>> + my $result; >>> + if ($method eq 'bruteforce') { >>> + $result =3D $online_node_usage->select_best_balancing_migratio= n($candidates); >>> + } elsif ($method eq 'topsis') { >>> + $result =3D $online_node_usage->select_best_balancing_migratio= n_topsis($candidates); >>> + } >>> + >>> + # happens if $candidates is empty or $method isn't handled above >>> + return if !$result; >>> + >>> + my ($migration, $target_imbalance) =3D $result->@{qw(migration imb= alance)}; >>> + >>> + my $relative_change =3D ($imbalance - $target_imbalance) / $imbala= nce; >> >> Since you get $imbalance from a function that returns 0.0 for the case >> that the cluster load is perfectly balanced (?), you could run into >> division by 0 here, no? >> > > technically this could happen, however an imbalance of 0.0 certainly > should not exceed a threshold (this is the case that "the cluster load > is perfectly balanced"); so the $relative_change ought to be never comput= ed=20 > Good catch, thanks to you both! Even though it's unpractical, users can still set the threshold to 0.0, which could actually cause a division by zero here, because the threshold is compared by a >=3D relation. # cat imbalance-zero.pl #!/usr/bin/perl =20 use v5.36; =20 my $imbalance =3D 0.0; my $threshold =3D 0.0; my $hold_duration =3D 3; my $sustained_imbalance_round =3D 0; =20 sub test { if ($imbalance < $threshold) { $sustained_imbalance_round =3D 0; return; } else { $sustained_imbalance_round++; print "imbalance threshold exceeded\n"; return if $sustained_imbalance_round < $hold_duration; print "sustained high imbalance\n"; $sustained_imbalance_round =3D 0; } =20 my $target_imbalance =3D 0.0; my $relative_change =3D ($imbalance - $target_imbalance) / $imbalan= ce; } =20 test(); test(); test(); # chmod +x imbalance-zero.pl # ./imbalance-zero.pl imbalance threshold exceeded imbalance threshold exceeded imbalance threshold exceeded sustained high imbalance Illegal division by zero at ./imbalance-zero.pl line 23. The system is rather unstable in that regard anyway (same if $margin =3D 0.0), because it always tries to load balance every $hold_duration HA rounds. I'm not sure whether we should prevent this with adjusting the range for both the threshold and margin to be at least larger than some minimum value, so that the load balancing system won't become unstable. >>> + return if $relative_change < $margin; >>> + >>> + my ($sid, $source, $target) =3D $migration->@{qw(sid source-node t= arget-node)}; >>> + >>> + my (undef, $type, $id) =3D $haenv->parse_sid($sid); >>> + my $task =3D $type eq 'vm' ? "migrate" : "relocate"; >>> + my $cmd =3D "$task $sid $target"; >>> + >>> + my $target_imbalance_str =3D int(100 * $target_imbalance + 0.5) / = 100; >>> + $haenv->log( >>> + 'info', >>> + "auto rebalance - $task $sid to $target (expected target imbal= ance: $target_imbalance_str)", >>> + ); >>> + >>> + $self->queue_resource_motion($cmd, $task, $sid, $target); >>> +} >>> + > > [snip]