From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3962893D65 for ; Wed, 22 Feb 2023 16:20:27 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1EC8E1A8B2 for ; Wed, 22 Feb 2023 16:19:57 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 22 Feb 2023 16:19:56 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 261FA4814A; Wed, 22 Feb 2023 16:19:56 +0100 (CET) Message-ID: <8edb8da5-04b4-ed25-c56f-9626a2ee259f@proxmox.com> Date: Wed, 22 Feb 2023 16:19:55 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 From: Fiona Ebner To: Proxmox VE development discussion , Alexandre Derumier References: <20230213120021.3783742-1-aderumier@odiso.com> <20230213120021.3783742-16-aderumier@odiso.com> Content-Language: en-US In-Reply-To: <20230213120021.3783742-16-aderumier@odiso.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.044 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.095 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v4 qemu-server 15/16] memory: virtio-mem : implement redispatch retry. X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Feb 2023 15:20:27 -0000 Am 13.02.23 um 13:00 schrieb Alexandre Derumier: > If some memory can be removed on a specific node, > we try to rebalance again on other nodes > > Signed-off-by: Alexandre Derumier > --- > PVE/QemuServer/Memory.pm | 51 +++++++++++++++++++++++++++------------- > 1 file changed, 35 insertions(+), 16 deletions(-) > > diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm > index bf4e92a..f02b4e0 100644 > --- a/PVE/QemuServer/Memory.pm > +++ b/PVE/QemuServer/Memory.pm > @@ -201,13 +201,28 @@ my sub get_virtiomem_total_current_size { > return $size; > } > > +my sub get_virtiomem_total_errors_size { > + my ($mems) = @_; > + > + my $size = 0; > + for my $mem (values %$mems) { > + next if !$mem->{error}; > + $size += $mem->{current}; > + } > + return $size; > +} > + > my sub balance_virtiomem { > my ($vmid, $virtiomems, $blocksize, $target_total) = @_; > > - my $nb_virtiomem = scalar(keys %$virtiomems); > + my $nb_virtiomem = scalar(grep { !$_->{error} } values $virtiomems->%*); > > print"try to balance memory on $nb_virtiomem virtiomems\n"; > > + die "No more available blocks in virtiomem to balance all requested memory\n" > + if $target_total < 0; I fee like this message is a bit confusing. This can only happen on unplug, right? And reading that "no more blocks are available" sounds like a paradox then. It's rather that no more blocks can be unplugged. If we really want to, if the $target_total is negative, we could set it to 0 (best to do it at the call-side already) and try to unplug everything else? We won't reach the goal anymore, but we could still get closer to it in some cases. Would need a bit more adaptation to avoid an endless loop: we also need to stop if all devices reached their current goal this round (and no new errors appeared), e.g. balance_virtiomem() could just have that info as its return value. Example: > update VM 101: -memory 4100,max=65536,virtio=1 > try to balance memory on 2 virtiomems > virtiomem0: set-requested-size : 0 > virtiomem1: set-requested-size : 4 > virtiomem1: last: 4 current: 4 target: 4 > virtiomem1: completed > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: increase retry: 0 > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: increase retry: 1 > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: increase retry: 2 > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: increase retry: 3 > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: increase retry: 4 > virtiomem0: last: 16 current: 16 target: 0 > virtiomem0: too many retry. set error > virtiomem0: increase retry: 5 Currently it stops here, but with setting $target_total = 0 it continues... > try to balance memory on 1 virtiomems > virtiomem1: set-requested-size : 0 > virtiomem1: last: 4 current: 0 target: 0 > virtiomem1: completed ...and gets closer to the goal... > try to balance memory on 1 virtiomems > virtiomem1: set-requested-size : 0 > virtiomem1: last: 4 current: 0 target: 0 > virtiomem1: completed > try to balance memory on 1 virtiomems > virtiomem1: set-requested-size : 0 > virtiomem1: last: 4 current: 0 target: 0 > virtiomem1: completed ...but then it loops, because I didn't add the other stop condition yet ;). But not sure, likely too much magic. > + die "No more available virtiomem to balance the remaining memory\n" if $nb_virtiomem == 0; "No more virtiomem devices left to try to ..." might be a bit clearer. Technically, they are still available, we just ignore them because they don't reach the target in time. > + > #if we can't share exactly the same amount, we add the remainder on last node > my $target_aligned = int( $target_total / $nb_virtiomem / $blocksize) * $blocksize; > my $target_remaining = $target_total - ($target_aligned * ($nb_virtiomem-1));