From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 9518F93CBF for ; Wed, 22 Feb 2023 16:19:51 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7F0CD1A899 for ; Wed, 22 Feb 2023 16:19:51 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 22 Feb 2023 16:19:50 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4F6E848141; Wed, 22 Feb 2023 16:19:50 +0100 (CET) Message-ID: <9fff5f59-da9a-505c-9427-9320a02a72fb@proxmox.com> Date: Wed, 22 Feb 2023 16:19:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 From: Fiona Ebner To: Proxmox VE development discussion , Alexandre Derumier References: <20230213120021.3783742-1-aderumier@odiso.com> <20230213120021.3783742-15-aderumier@odiso.com> Content-Language: en-US In-Reply-To: <20230213120021.3783742-15-aderumier@odiso.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.044 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.095 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v4 qemu-server 14/16] memory: add virtio-mem support X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Feb 2023 15:19:51 -0000 Am 13.02.23 um 13:00 schrieb Alexandre Derumier: > diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm > index 1b1c99d..bf4e92a 100644 > --- a/PVE/QemuServer/Memory.pm > +++ b/PVE/QemuServer/Memory.pm > @@ -3,6 +3,8 @@ package PVE::QemuServer::Memory; > use strict; > use warnings; > > +use POSIX qw(ceil); > + > use PVE::JSONSchema; > use PVE::Tools qw(run_command lock_file lock_file_full file_read_firstline dir_glob_foreach); > use PVE::Exception qw(raise raise_param_exc); > @@ -16,6 +18,7 @@ our @EXPORT_OK = qw( > get_current_memory > parse_memory > get_host_max_mem > +get_virtiomem_block_size > ); > > my $MAX_NUMA = 8; > @@ -37,6 +40,12 @@ our $memory_fmt = { > maximum => 4194304, > format => 'pve-qm-memory-max', > }, > + virtio => { > + description => "Enable virtio-mem memory (Experimental: Only works with Linux guest with kernel >= 5.10)", Nit: How about "Use virtio-mem devices for hotplug (Experimental: ...)", then people immediately know it's for hotplug. > + type => 'boolean', > + optional => 1, > + default => 0, > + }, > }; > > PVE::JSONSchema::register_format('pve-qm-memory-max', \&verify_qm_memory_max); > @@ -72,7 +81,9 @@ my sub get_static_mem { > my $static_memory = 0; > my $memory = parse_memory($conf->{memory}); > > - if ($memory->{max}) { > + if ($memory->{virtio}) { > + $static_memory = 4096; > + } elsif ($memory->{max}) { > my $dimm_size = $memory->{max} / $MAX_SLOTS; > #static mem can't be lower than 4G and lower than 1 dimmsize by socket > $static_memory = $dimm_size * $sockets; > @@ -161,6 +172,117 @@ sub get_current_memory { > return $memory->{current}; > } > > +sub get_virtiomem_block_size { > + my ($conf) = @_; > + > + my $sockets = $conf->{sockets} || 1; > + my $MAX_MEM = get_max_mem($conf); > + my $static_memory = get_static_mem($conf, $sockets); Nit: Not making a difference with the current implemenetation, but this should pass 1 for hotplug (we only use the virtio-mem devices for hotplug). > + my $memory = get_current_memory($conf->{memory}); > + > + #virtiomem can map 32000 block size. > + #try to use lowest blocksize, lower = more chance to unplug memory. > + my $blocksize = ($MAX_MEM - $static_memory) / 32000; > + #2MB is the minimum to be aligned with THP > + $blocksize = 2 if $blocksize < 2; > + $blocksize = 2**(ceil(log($blocksize)/log(2))); > + #Linux guest kernel only support 4MiB block currently (kernel <= 6.2) > + $blocksize = 4 if $blocksize < 4; > + > + return $blocksize; > +} > + > +my sub get_virtiomem_total_current_size { > + my ($mems) = @_; > + my $size = 0; > + for my $mem (values %$mems) { > + $size += $mem->{current}; > + } > + return $size; > +} > + > +my sub balance_virtiomem { > + my ($vmid, $virtiomems, $blocksize, $target_total) = @_; > + > + my $nb_virtiomem = scalar(keys %$virtiomems); > + > + print"try to balance memory on $nb_virtiomem virtiomems\n"; > + > + #if we can't share exactly the same amount, we add the remainder on last node > + my $target_aligned = int( $target_total / $nb_virtiomem / $blocksize) * $blocksize; > + my $target_remaining = $target_total - ($target_aligned * ($nb_virtiomem-1)); > + > + my $i = 0; > + foreach my $id (sort keys %$virtiomems) { > + my $virtiomem = $virtiomems->{$id}; > + $i++; > + my $virtiomem_target = $i == $nb_virtiomem ? $target_remaining : $target_aligned; > + $virtiomem->{completed} = 0; > + $virtiomem->{retry} = 0; > + $virtiomem->{target} = $virtiomem_target; > + > + print "virtiomem$id: set-requested-size : $virtiomem_target\n"; > + mon_cmd($vmid, 'qom-set', > + path => "/machine/peripheral/virtiomem$id", > + property => "requested-size", > + value => $virtiomem_target * 1024 * 1024); Style nit: trailing spaces and should really put each argument on its own line, with mon_cmd( and the final ) on their own line too. > + } > + > + my $total_finished = 0; > + my $error = undef; > + > + while ($total_finished != $nb_virtiomem) { > + > + sleep 1; > + > + $total_finished = 0; > + > + foreach my $id (keys %$virtiomems) { > + > + my $virtiomem = $virtiomems->{$id}; > + > + if ($virtiomem->{error} || $virtiomem->{completed}) { > + $total_finished++; > + next; > + } > + > + my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$id", property => "size"); > + $virtiomem->{current} = $size / 1024 / 1024; > + print"virtiomem$id: last: $virtiomem->{last} current: $virtiomem->{current} target: $virtiomem->{target}\n"; [0] marker so I can reference this message below :) > + > + if($virtiomem->{current} == $virtiomem->{target}) { > + print"virtiomem$id: completed\n"; > + $virtiomem->{completed} = 1; > + next; > + } > + > + if($virtiomem->{current} != $virtiomem->{last}) { > + #if value has changed, but not yet completed > + print "virtiomem$id: changed but don't not reach target yet\n"; "don't not" is wrong. But do we really need this print? I feel like the above[0] is already enough. It already contains the information about last and current. > + $virtiomem->{retry} = 0; > + $virtiomem->{last} = $virtiomem->{current}; > + next; > + } > + > + if($virtiomem->{retry} >= 5) { > + print "virtiomem$id: too many retry. set error\n"; s/retry/retries/ But I'd also change the message to be a bit more telling to users, "set error" could mean anything. Maybe something like: "virtiomem$id: target memory still not reached, ignoring device from now on"? > + $virtiomem->{error} = 1; > + $error = 1; > + #as change is async, we don't want that value change after the api call > + eval { > + mon_cmd($vmid, 'qom-set', > + path => "/machine/peripheral/virtiomem$id", > + property => "requested-size", > + value => $virtiomem->{current} * 1024 *1024); > + }; > + } > + print"virtiomem$id: increase retry: $virtiomem->{retry}\n"; Maybe add output the retry counter in the message [0] to avoid output bloat? > + $virtiomem->{retry}++; > + } > + } > + die "No more available blocks in virtiomem to balance all requested memory\n" if $error; > +} > + > sub get_numa_node_list { > my ($conf) = @_; > my @numa_map; > @@ -247,7 +369,39 @@ sub qemu_memory_hotplug { > my $MAX_MEM = get_max_mem($conf); > die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $value > $MAX_MEM; > > - if ($value > $memory) { > + my $confmem = parse_memory($conf->{memory}); This is already $oldmem, no need for this second variable.