From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 72F7E90737 for ; Fri, 3 Feb 2023 14:46:39 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5CDC654A2 for ; Fri, 3 Feb 2023 14:46:39 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 3 Feb 2023 14:46:38 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 0BCA4452D4; Fri, 3 Feb 2023 14:46:38 +0100 (CET) Message-ID: Date: Fri, 3 Feb 2023 14:46:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 From: Fiona Ebner To: pve-devel@lists.proxmox.com, "aderumier@odiso.com" References: <20230202110344.840195-1-aderumier@odiso.com> <20230202110344.840195-12-aderumier@odiso.com> Content-Language: en-US In-Reply-To: <20230202110344.840195-12-aderumier@odiso.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.042 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.09 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/13] memory: add virtio-mem support X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Feb 2023 13:46:39 -0000 Am 02.02.23 um 12:03 schrieb Alexandre Derumier: > @@ -157,6 +168,109 @@ sub get_current_memory { > return $memory->{current}; > } > > +sub get_virtiomem_block_size { > + my ($conf) = @_; > + > + my $sockets = $conf->{sockets} || 1; > + my $MAX_MEM = get_max_mem($conf); > + my $static_memory = get_static_mem($conf, $sockets); > + my $memory = get_current_memory($conf->{memory}); > + > + #virtiomem can map 32000 block size. > + #try to use lowest blocksize, lower = more chance to unplug memory. > + my $blocksize = ($MAX_MEM - $static_memory) / 32000; > + #2MB is the minimum to be aligned with THP > + $blocksize = 2 if $blocksize < 2; > + $blocksize = 2**(ceil(log($blocksize)/log(2))); > + #Linux guest kernel only support 4MiB block currently (kernel <= 6.2) > + $blocksize = 4 if $blocksize < 4; > + > + return $blocksize; > +} > + > +my sub get_virtiomem_total_current_size { > + my ($mems) = @_; > + my $size = 0; > + for my $mem (values %$mems) { > + $size += $mem->{current}; > + } > + return $size; > +} > + > +my sub balance_virtiomem { The function should die if not all memory can be plugged. > + my ($vmid, $virtiomems, $blocksize, $target_virtiomem_total) = @_; > + > + my $nb_virtiomem = keys %$virtiomems; Style nit: explicit scalar() would be nice. > + > + print"try to balance memory on $nb_virtiomem virtiomems\n"; > + die "error. no more available blocks in virtiomem to balance the remaining memory" if $target_virtiomem_total < 0; > + > + #if we can't share exactly the same amount, we add the remainder on last node > + my $virtiomem_target_aligned = int( $target_virtiomem_total / $nb_virtiomem / $blocksize) * $blocksize; > + my $virtiomem_target_remaining = $target_virtiomem_total - ($virtiomem_target_aligned * ($nb_virtiomem-1)); > + > + my $i = 0; > + foreach my $id (sort keys %$virtiomems) { > + my $virtiomem = $virtiomems->{$id}; > + $i++; > + my $virtiomem_target = $i == $nb_virtiomem ? $virtiomem_target_remaining : $virtiomem_target_aligned; > + $virtiomem->{completed} = 0; > + $virtiomem->{retry} = 0; > + $virtiomem->{target} = $virtiomem_target; > + > + print "virtiomem$id: set-requested-size : $virtiomem_target\n"; > + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem_target * 1024 * 1024); Style nit: some lines are over 100 chars and some quite a bit > + } > + > + while (1) { > + > + sleep 1; > + my $total_finished = 0; > + > + foreach my $id (keys %$virtiomems) { > + > + my $virtiomem = $virtiomems->{$id}; > + > + if ($virtiomem->{error} || $virtiomem->{completed}) { > + $total_finished++; > + next; > + } > + > + my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$id", property => "size"); > + $virtiomem->{current} = $size / 1024 / 1024; > + print"virtiomem$id: virtiomem->last: $virtiomem->{last} virtiomem->current: $virtiomem->{current} virtio_mem_target:$virtiomem->{target}\n"; > + > + if($virtiomem->{current} == $virtiomem->{target}) { > + print"virtiomem$id: completed\n"; > + $virtiomem->{completed} = 1; > + next; > + } > + > + if($virtiomem->{current} != $virtiomem->{last}) { > + #if value has changed, but not yet completed > + print "virtiomem$id: changed but don't not reach target yet\n"; > + $virtiomem->{retry} = 0; > + $virtiomem->{last} = $virtiomem->{current}; > + next; > + } > + > + if($virtiomem->{retry} >= 5) { > + print "virtiomem$id: too many retry. set error\n"; > + $virtiomem->{error} = 1; > + #as change is async, we don't want that value change after the api call > + eval { > + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem->{current} * 1024 *1024); > + }; > + } > + print"virtiomem$id: increase retry: $virtiomem->{retry}\n"; > + $virtiomem->{retry}++; > + } > + > + my $nb_virtiomem = keys %$virtiomems; Redeclares variable. The number can't change from before, or am I missing something? > + return if $total_finished == $nb_virtiomem; Style nit: could also use while ($total_finished != $nb_virtiomem) at the beginning of the loop. > + } > +} > + > sub get_numa_node_list { > my ($conf) = @_; > my @numa_map; > @@ -237,7 +351,39 @@ sub qemu_memory_hotplug { > my $MAX_MEM = get_max_mem($conf); > die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $value > $MAX_MEM; > > - if ($value > $memory) { > + my $confmem = parse_memory($conf->{memory}); You can just re-use the existing $oldmem? > + > + if ($confmem->{virtio}) { > + my $blocksize = get_virtiomem_block_size($conf); > + > + my $virtiomems = {}; > + > + for (my $i = 0; $i < $sockets; $i++) { > + my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$i", property => "size"); > + $size = $size / 1024 /1024; > + $virtiomems->{$i} = { > + current => $size, > + last => $size, > + error => 0, > + completed => 0, > + retry => 0 > + }; > + } > + > + my $target_virtiomem_total = $value - $static_memory; > + my $err; > + eval { > + balance_virtiomem($vmid, $virtiomems, $blocksize, $target_virtiomem_total); > + }; > + $err = $@ if $@; > + > + my $current_memory = $static_memory + get_virtiomem_total_current_size($virtiomems); > + $newmem->{current} = $current_memory; > + $conf->{memory} = print_memory($newmem); > + PVE::QemuConfig->write_config($vmid, $conf); > + die $err if $err; > + > + } elsif ($value > $memory) { > > my $numa_hostmap; > > @@ -441,17 +590,42 @@ sub config { > } > > if ($hotplug) { > - foreach_dimm($conf, $vmid, $memory, $static_memory, sub { > - my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_; > > - my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size); > + my $confmem = parse_memory($conf->{memory}); > + > + if ($confmem->{'virtio'}) { > > - push @$cmd, "-object" , $mem_object; > - push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode"; > + my $MAX_MEM = get_max_mem($conf); > + my $node_maxmem = ($MAX_MEM - $static_memory) / $sockets; > + my $node_mem = ($memory - $static_memory) / $sockets; Nit: If the number of $sockets is not a power of 2, I think this breaks. But I guess we already don't support it. Running current version without your patches (for a VM with memory hotplug): root@pve701 ~ # qm set 131 --sockets 3 update VM 131: -sockets 3 root@pve701 ~ # qm set 131 -memory 8192 update VM 131: -memory 8192 root@pve701 ~ # qm start 131 kvm: total memory for NUMA nodes (0x3fffffff) should equal RAM size (0x40000000) start failed: QEMU exited with code 1 I guess we can just fix it up together with the existing rounding issue if/when somebody complains about it ;) > + my $blocksize = get_virtiomem_block_size($conf); > > - die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n" > - if $current_size > $memory; > - }); > + die "memory need to be a multiple of $blocksize MiB with maxmemory $MAX_MEM MiB when virtiomem is enabled\n" > + if $memory % $blocksize != 0; > + > + for (my $i = 0; $i < $sockets; $i++) { > + > + my $id = "virtiomem$i"; > + my $mem_object = print_mem_object($conf, "mem-$id", $node_maxmem); > + push @$cmd, "-object" , "$mem_object,reserve=off"; > + > + my $mem_device = "virtio-mem-pci,block-size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-$id,node=$i"; > + $mem_device .= ",prealloc=on" if $conf->{hugepages}; > + $mem_devices->{$id} = $mem_device; > + } > + } else { > + foreach_dimm($conf, $vmid, $memory, $static_memory, sub { > + my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_; > + > + my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size); > + > + push @$cmd, "-object" , $mem_object; > + push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode"; > + > + die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n" > + if $current_size > $memory; > + }); > + } > } > } >