From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id B0E2291121 for ; Tue, 20 Dec 2022 11:26:40 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9359931393 for ; Tue, 20 Dec 2022 11:26:40 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 20 Dec 2022 11:26:38 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 810B944277; Tue, 20 Dec 2022 11:26:38 +0100 (CET) Message-ID: <15f5554d-3708-ac78-d2e2-1d797e55e211@proxmox.com> Date: Tue, 20 Dec 2022 11:26:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Content-Language: en-US To: "DERUMIER, Alexandre" , Proxmox VE development discussion References: <20221209192726.1499142-1-aderumier@odiso.com> <20221209192726.1499142-9-aderumier@odiso.com> <87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com> <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com> From: Fiona Ebner In-Reply-To: <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.601 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -1.149 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [pci.pm, gnu.org] Subject: Re: [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2022 10:26:40 -0000 Am 20.12.22 um 10:50 schrieb DERUMIER, Alexandre: >> >>> +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;>>> +    #round next power of 2 >>> +    $blocksize = 2**(int(log($blocksize)/log(2))+1); >> >> What if log($blocksize)/log(2) is exactly an integer? Do we still >> want >> to add 1 then? If not, please use ceil(...) instead of int(...)+1. >> Well, >> I guess it can't happen in practice by what values are possible for >> $MAX_MEM and $static_memory, but still. >> > The math seem to be good to have power of 2 as blocksize > > log(1)/log(2):0 blocksize:2 > log(2)/log(2):1 blocksize:4 Yes, but here you'd use 4, when 2 would also be good. If that doesn't matter, feel free to keep it as-is; I was just wondering. It's likely not even relevant in practice, because with the values $MAX_MEM and $static_memory can take, I'm not sure we can even get integers for the argument to the first log()? > with a ceil, we don't have block begin to 2 > > log(1)/log(2):0 blocksize:1 > log(2)/log(2):1 blocksize:2 Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater than 1? And if it could get smaller than 1, we also might have issues with the int()+1 approach, because the result of the first log() will become negative. To be on the safe side we could just move the minimum check up: my $blocksize = ($MAX_MEM - $static_memory) / 32000; $blocksize = 2 if $blocksize < 2; $blocksize = 2**(ceil(log($blocksize)/log(2))); >>> +    #2MB is the minimum to be aligned with THP >>> +    $blocksize = 2 if $blocksize < 2; >> >> Nit: $blocksize is at least 2**1 after the previous caluclation, so >> this >> isn't really needed. > yes,indeed. > >> >>> +           my $mem_device = "virtio-mem-pci,block- >>> size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem- >>> $id,node=$i$pciaddr"; >>> +           $mem_device .= ",prealloc=on" if $conf->{hugepages}; >> >> So prealloc=on for the device, but not prealloc=yes for the object >> below[0]. Would you mind explaining it to me? I just found the part >> mentioned for v7.0 here >> > oh, yes, sorry. > Just look here for details: > https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00902.html > "A common use case for hugetlb will be using "reserve=off,prealloc=off" > for > the memory backend and "prealloc=on" for the virtio-mem device. This > way, no huge pages will be reserved for the process, but we can recover > if there are no actual huge pages when plugging memory. Libvirt is > already prepared for this. > " > Thanks! Could you please add it to the commit message? >>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm >>> index a18b974..0187c74 100644 >>> --- a/PVE/QemuServer/PCI.pm >>> +++ b/PVE/QemuServer/PCI.pm >>> @@ -249,6 +249,14 @@ sub get_pci_addr_map { >>>         'scsihw2' => { bus => 4, addr => 1 }, >>>         'scsihw3' => { bus => 4, addr => 2 }, >>>         'scsihw4' => { bus => 4, addr => 3 }, >>> +       'virtiomem0' => { bus => 4, addr => 4 }, >>> +       'virtiomem1' => { bus => 4, addr => 5 }, >>> +       'virtiomem2' => { bus => 4, addr => 6 }, >>> +       'virtiomem3' => { bus => 4, addr => 7 }, >>> +       'virtiomem4' => { bus => 4, addr => 8 }, >>> +       'virtiomem5' => { bus => 4, addr => 9 }, >>> +       'virtiomem6' => { bus => 4, addr => 10 }, >>> +       'virtiomem7' => { bus => 4, addr => 11 }, >> >> What if $conf->{sockets} > 8? Maybe mention the limitation in the >> description of the 'virtio' property in the 'memory' string. Is the >> plan >> to just add on more virtiomem PCI devices in the future? >> > Well, do we really support more than 8 sockets ? > I'm not sure than current numa implementation is working with more than > 8 socket, and in real world, I never have seen servers with more than 8 > sockets. (I think super-expensive hardware with 16 - 32 sockets is the > max) > AFAIK, The main usecase of sockets/numa node, is for auto > numabalancing, to try to map the qemu numa nodes to physical numa > nodes. > > If needed, we could a dedicated pci bridge (like for virtioscsi), with > 32 devices. > and use something like virtiomem0 = addr=1 , virtiomem1=addr=2 > > Okay, makes sense. You never know what the future brings, but if it'll take some time until we need to add more (and not that much more) I guess it's fine.