From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 837DC1FF137 for ; Tue, 31 Mar 2026 11:38:46 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D76AF163F9; Tue, 31 Mar 2026 11:39:13 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 31 Mar 2026 11:39:09 +0200 Message-Id: Subject: Re: [PATCH ha-manager v3 35/40] implement automatic rebalancing From: "Dominik Rusovac" To: "Daniel Kral" , =?utf-8?q?Michael_K=C3=B6ppl?= , X-Mailer: aerc 0.20.0 References: <20260330144101.668747-1-d.kral@proxmox.com> <20260330144101.668747-36-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774949893688 X-SPAM-LEVEL: Spam detection results: 0 AWL -1.028 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 1 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: ADYPTQ3WHFKZUV67R7UPZPADZXOIOSS6 X-Message-ID-Hash: ADYPTQ3WHFKZUV67R7UPZPADZXOIOSS6 X-MailFrom: d.rusovac@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox VE development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Tue Mar 31, 2026 at 11:32 AM CEST, Daniel Kral wrote: > On Tue Mar 31, 2026 at 11:16 AM CEST, Dominik Rusovac wrote: >> On Tue Mar 31, 2026 at 11:07 AM CEST, Michael K=C3=B6ppl wrote: >>> 2 comments inline >>> >>> On Mon Mar 30, 2026 at 4:30 PM CEST, Daniel Kral wrote: >>> >>> [snip] >>> >>>> + my $candidates =3D $self->get_resource_migration_candidates(); >>>> + >>>> + my $result; >>>> + if ($method eq 'bruteforce') { >>>> + $result =3D $online_node_usage->select_best_balancing_migrati= on($candidates); >>>> + } elsif ($method eq 'topsis') { >>>> + $result =3D $online_node_usage->select_best_balancing_migrati= on_topsis($candidates); >>>> + } >>>> + >>>> + # happens if $candidates is empty or $method isn't handled above >>>> + return if !$result; >>>> + >>>> + my ($migration, $target_imbalance) =3D $result->@{qw(migration im= balance)}; >>>> + >>>> + my $relative_change =3D ($imbalance - $target_imbalance) / $imbal= ance; >>> >>> Since you get $imbalance from a function that returns 0.0 for the case >>> that the cluster load is perfectly balanced (?), you could run into >>> division by 0 here, no? >>> >> >> technically this could happen, however an imbalance of 0.0 certainly >> should not exceed a threshold (this is the case that "the cluster load >> is perfectly balanced"); so the $relative_change ought to be never compu= ted=20 >> > > Good catch, thanks to you both! > > Even though it's unpractical, users can still set the threshold to 0.0, > which could actually cause a division by zero here, because the > threshold is compared by a >=3D relation. > > > > # cat imbalance-zero.pl > #!/usr/bin/perl > =20 > use v5.36; > =20 > my $imbalance =3D 0.0; > my $threshold =3D 0.0; > my $hold_duration =3D 3; > my $sustained_imbalance_round =3D 0; > =20 > sub test { > if ($imbalance < $threshold) { > $sustained_imbalance_round =3D 0; > return; > } else { > $sustained_imbalance_round++; > print "imbalance threshold exceeded\n"; > return if $sustained_imbalance_round < $hold_duration; > print "sustained high imbalance\n"; > $sustained_imbalance_round =3D 0; > } > =20 > my $target_imbalance =3D 0.0; > my $relative_change =3D ($imbalance - $target_imbalance) / $imbal= ance; > } > =20 > test(); > test(); > test(); > # chmod +x imbalance-zero.pl > # ./imbalance-zero.pl > imbalance threshold exceeded > imbalance threshold exceeded > imbalance threshold exceeded > sustained high imbalance > Illegal division by zero at ./imbalance-zero.pl line 23. > > > > The system is rather unstable in that regard anyway (same if $margin =3D > 0.0), because it always tries to load balance every $hold_duration HA > rounds. > > I'm not sure whether we should prevent this with adjusting the range for > both the threshold and margin to be at least larger than some minimum > value, so that the load balancing system won't become unstable. > yeah, either this, or you add a guard to return early (before the threshold guard) whenever imbalance is 0.0, I guess >>>> + return if $relative_change < $margin; >>>> + >>>> + my ($sid, $source, $target) =3D $migration->@{qw(sid source-node = target-node)}; >>>> + >>>> + my (undef, $type, $id) =3D $haenv->parse_sid($sid); >>>> + my $task =3D $type eq 'vm' ? "migrate" : "relocate"; >>>> + my $cmd =3D "$task $sid $target"; >>>> + >>>> + my $target_imbalance_str =3D int(100 * $target_imbalance + 0.5) /= 100; >>>> + $haenv->log( >>>> + 'info', >>>> + "auto rebalance - $task $sid to $target (expected target imba= lance: $target_imbalance_str)", >>>> + ); >>>> + >>>> + $self->queue_resource_motion($cmd, $task, $sid, $target); >>>> +} >>>> + >> >> [snip]