From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com, aderumier@odiso.com
Subject: Re: [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support
Date: Fri, 16 Dec 2022 14:42:15 +0100 [thread overview]
Message-ID: <87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com> (raw)
In-Reply-To: <20221209192726.1499142-9-aderumier@odiso.com>
Am 09.12.22 um 20:27 schrieb Alexandre Derumier:
> a 4GB static memory is needed for DMA+boot memory, as this memory
> is almost always un-unpluggeable.
>
> 1 virtio-mem pci device is setup for each numa node on pci.4 bridge
>
> virtio-mem use a fixed blocksize with 32000 blocks
> Blocksize is computed from the maxmemory-4096/32000 with a minimum of
> 2MB to map THP.
> (lower blocksize = more chance to unplug memory).
>
> fixes:
> https://bugzilla.proxmox.com/show_bug.cgi?id=931
Comment 7 talks about Windows, and virtio-mem is not supported there at
the moment, so I don't think we should consider it fixed ;)
> https://bugzilla.proxmox.com/show_bug.cgi?id=2949
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
> PVE/QemuServer.pm | 8 +++-
> PVE/QemuServer/Memory.pm | 98 +++++++++++++++++++++++++++++++++++++---
> PVE/QemuServer/PCI.pm | 8 ++++
> 3 files changed, 106 insertions(+), 8 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 0d5b550..43fab29 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -285,6 +285,12 @@ my $memory_fmt = {
> optional => 1,
> enum => [@max_memory_list],
> },
> + virtio => {
> + description => "enable virtio-mem memory",
We really should mention that it's a technology preview, and that it
only works for Linux >=5.8 guests currently:
https://virtio-mem.gitlab.io/user-guide/#important-current-limitations
> + type => 'boolean',
> + optional => 1,
> + default => 0,
> + },
> };
>
> my $meta_info_fmt = {
> @@ -3898,7 +3904,7 @@ sub config_to_command {
> push @$cmd, get_cpu_options($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough);
> }
>
> - PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd);
> + PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $devices, $bridges, $arch, $machine_type);
>
> push @$cmd, '-S' if $conf->{freeze};
>
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 8bbbf07..70ab65a 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -8,6 +8,8 @@ use PVE::Exception qw(raise raise_param_exc);
>
> use PVE::QemuServer;
> use PVE::QemuServer::Monitor qw(mon_cmd);
> +use PVE::QemuServer::PCI qw(print_pci_addr);
> +
> use base qw(Exporter);
>
> our @EXPORT_OK = qw(
> @@ -27,7 +29,9 @@ my sub get_static_mem {
> my $static_memory = 0;
> my $memory = PVE::QemuServer::parse_memory($conf->{memory});
>
> - if($memory->{max}) {
> + if ($memory->{virtio}) {
> + $static_memory = 4096;
> + } elsif ($memory->{max}) {
> my $dimm_size = $memory->{max} / 64;
> #static mem can't be lower than 4G and lower than 1 dimmsize by socket
> $static_memory = $dimm_size * $sockets;
> @@ -102,6 +106,24 @@ my sub get_max_mem {
> return $cpu_max_mem;
> }
>
> +my sub get_virtiomem_block_size {
> + my ($conf) = @_;
> +
> + my $MAX_MEM = get_max_mem($conf);
> + my $static_memory = get_static_mem($conf);
> + my $memory = get_current_memory($conf);
> + #virtiomem can map 32000 block size. try to use lowerst blocksize, lower = more chance to unplug memory.
Style nit: line too long
> + my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> + #round next power of 2
> + $blocksize = 2**(int(log($blocksize)/log(2))+1);
What if log($blocksize)/log(2) is exactly an integer? Do we still want
to add 1 then? If not, please use ceil(...) instead of int(...)+1. Well,
I guess it can't happen in practice by what values are possible for
$MAX_MEM and $static_memory, but still.
> + #2MB is the minimum to be aligned with THP
> + $blocksize = 2 if $blocksize < 2;
Nit: $blocksize is at least 2**1 after the previous caluclation, so this
isn't really needed.
> +
> + die "memory size need to be multiple of $blocksize MB when virtio-mem is enabled" if ($memory % $blocksize != 0);
Missing newline in error message.
Style nit: line too long
> +
> + return $blocksize;
> +}
> +
> sub get_current_memory{
> my ($conf) = @_;
>
> @@ -224,7 +246,41 @@ sub qemu_memory_hotplug {
> my $MAX_MEM = get_max_mem($conf);
> die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $value > $MAX_MEM;
>
> - if ($value > $memory) {
> + my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
> +
> + if ($confmem->{virtio}) {
> + my $blocksize = get_virtiomem_block_size($conf);
> + my $requested_size = ($value - $static_memory) / $sockets * 1024 * 1024;
> + my $totalsize = $static_memory;
> + my $err = undef;
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> +
> + my $id = "virtiomem$i";
> + my $retry = 0;
> + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/$id", property => "requested-size", value => int($requested_size));
I'd eval the mon_cmd's and also catch errors there.
> +
> + my $size = 0;
> + while (1) {
> + sleep 1;
If there really is no good alternative to this querying loop, I'd rather
issue the qom-set command for all virtio-mem devices first, and then do
the loop. Maybe also move the sleep to the end of the loop.
> + $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/$id", property => "size");
> + $err = 1 if $retry > 5;
> + last if $size eq $requested_size || $retry > 5;
I think, $requested_size doesn't have to be a multiple of $sockets, so
this should rather be int($requested_size), which is what you set above.
> + $retry++;
> + }
> + $totalsize += ($size / 1024 / 1024 );
> + }
> + #update conf after each succesfull change
> + if($err) {
But this is only done in the error case, not after each successful change.
> + my $mem = { max => $MAX_MEM, virtio => 1};
> + $mem->{current} = $totalsize;
Nit: int($totalsize) just to be sure?
> + $conf->{memory} = PVE::QemuServer::print_memory($mem);
> + PVE::QemuConfig->write_config($vmid, $conf);
> + raise_param_exc({ 'memory' => "error modify virtio memory" }) if $err;
It's not necessarily a parameter issue, please use die instead.
> + }
> + return $totalsize;
The other branches don't (explicitly) return anything.
> +
> + } elsif ($value > $memory) {
>
> my $numa_hostmap;
>
> @@ -324,14 +380,15 @@ sub qemu_dimm_list {
> }
>
> sub config {
> - my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd) = @_;
> + my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $devices, $bridges, $arch, $machine_type) = @_;
>
> my $memory = get_current_memory($conf);
>
> my $static_memory = get_static_mem($conf);
> +
> my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
>
> - if ($hotplug_features->{memory} || defined($confmem->{max})) {
> + if ($hotplug_features->{memory} || defined($confmem->{max}) || defined($confmem->{virtio})) {
Again, should we even bother attaching the devices if hotplug is not
enabled?
> die "NUMA needs to be enabled for memory hotplug\n" if !$conf->{numa};
> my $MAX_MEM = get_max_mem($conf);
> die "Total memory is bigger than ${MAX_MEM}MB\n" if $memory > $MAX_MEM;
> @@ -342,8 +399,12 @@ sub config {
> }
>
> die "minimum memory must be ${static_memory}MB\n" if($memory < $static_memory);
> +
> + my $cmdstr = "size=${static_memory}";
> my $slots = $confmem->{max} ? 64 : 255;
> - push @$cmd, '-m', "size=${static_memory},slots=$slots,maxmem=${MAX_MEM}M";
> + $cmdstr .= ",slots=$slots" if !$confmem->{'virtio'};
> + $cmdstr .= ",maxmem=${MAX_MEM}M";
> + push @$cmd, '-m', $cmdstr;
>
> } else {
> push @$cmd, '-m', $static_memory;
> @@ -412,7 +473,26 @@ sub config {
> }
> }
>
> - if ($hotplug_features->{memory} || $confmem->{max}) {
> + if ($confmem->{'virtio'}) {
> + my $MAX_MEM = get_max_mem($conf);
> + my $node_maxmem = ($MAX_MEM - $static_memory) / $sockets;
> + my $node_mem = ($memory - $static_memory) / $sockets;
> + my $blocksize = get_virtiomem_block_size($conf);
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> +
> + my $id = "virtiomem$i";
> + my $mem_object = print_mem_object($conf, "mem-$id", $node_maxmem);
> + push @$cmd, "-object" , "$mem_object,reserve=off";
> +
> + my $pciaddr = print_pci_addr($id, $bridges, $arch, $machine_type);
Can we rather handle the PCI address printing in config_to_command() and
only pass in the addresses here? That would also avoid the PCI module
usage and the new "one-time usage" parameters passed to Memory::config().
Maybe have a small helper in here, that just returns the needed $ids
depending on the config. Then in config_to_command(), call that helper,
print the PCI addresses, then call Memory::config(..., { $id1 =>
$address1, $id2 => $adress2, ... }).
Might also not be the nicest, but at least be a little less cluttering
IMHO. But feel free to come up with something better or keep it as-is if
you really want to ;)
> + my $mem_device = "virtio-mem-pci,block-size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-$id,node=$i$pciaddr";
> + $mem_device .= ",prealloc=on" if $conf->{hugepages};
So prealloc=on for the device, but not prealloc=yes for the object
below[0]. Would you mind explaining it to me? I just found the part
mentioned for v7.0 here
https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html#updates
> + push @$devices, "-device", $mem_device;
The dimm devices in the other branch are not pushed onto $devices, so
this feels inconsistent. Why not add it onto $cmd too?
> + }
> +
> + } elsif ($hotplug_features->{memory} || $confmem->{max}) {
> +
> foreach_dimm($conf, $vmid, $memory, $sockets, sub {
> my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
>
> @@ -430,12 +510,16 @@ sub config {
> sub print_mem_object {
> my ($conf, $id, $size) = @_;
>
> + my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
> +
> if ($conf->{hugepages}) {
>
> my $hugepages_size = hugepages_size($conf, $size);
> my $path = hugepages_mount_path($hugepages_size);
>
> - return "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on,prealloc=yes";
> + my $object = "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on";
> + $object .= ",prealloc=yes" if !$confmem->{virtio};
[0]
> + return $object;
> } else {
> return "memory-backend-ram,id=$id,size=${size}M";
> }
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index a18b974..0187c74 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -249,6 +249,14 @@ sub get_pci_addr_map {
> 'scsihw2' => { bus => 4, addr => 1 },
> 'scsihw3' => { bus => 4, addr => 2 },
> 'scsihw4' => { bus => 4, addr => 3 },
> + 'virtiomem0' => { bus => 4, addr => 4 },
> + 'virtiomem1' => { bus => 4, addr => 5 },
> + 'virtiomem2' => { bus => 4, addr => 6 },
> + 'virtiomem3' => { bus => 4, addr => 7 },
> + 'virtiomem4' => { bus => 4, addr => 8 },
> + 'virtiomem5' => { bus => 4, addr => 9 },
> + 'virtiomem6' => { bus => 4, addr => 10 },
> + 'virtiomem7' => { bus => 4, addr => 11 },
What if $conf->{sockets} > 8? Maybe mention the limitation in the
description of the 'virtio' property in the 'memory' string. Is the plan
to just add on more virtiomem PCI devices in the future?
> } if !defined($pci_addr_map);
> return $pci_addr_map;
> }
next prev parent reply other threads:[~2022-12-16 13:42 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-09 19:27 [pve-devel] [PATCH qemu-server 00/10] rework memory hotplug + virtiomem Alexandre Derumier
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 01/10] test: add memory tests Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 02/10] add memory parser Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2023-01-02 10:50 ` DERUMIER, Alexandre
2023-01-05 12:47 ` Fiona Ebner
2023-01-02 11:23 ` DERUMIER, Alexandre
2023-01-05 12:48 ` Fiona Ebner
[not found] ` <4ba723fb986517054761eb65f38812fac86a895b.camel@groupe-cyllene.com>
2023-01-09 14:35 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 03/10] config: memory: add 'max' option Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 04/10] memory: add get_static_mem Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 05/10] memory: get_max_mem: use config memory max Alexandre Derumier
2022-12-16 13:39 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 06/10] memory: use 64 slots && static dimm size with max is defined Alexandre Derumier
2022-12-16 13:39 ` Fiona Ebner
2022-12-19 12:05 ` DERUMIER, Alexandre
2022-12-19 12:28 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 07/10] test: add memory-max tests Alexandre Derumier
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner [this message]
[not found] ` <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com>
2022-12-20 10:26 ` Fiona Ebner
[not found] ` <b354aab5e4791e7c862b15470ca24c273b8030be.camel@groupe-cyllene.com>
2022-12-20 12:31 ` DERUMIER, Alexandre
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 09/10] tests: add virtio-mem tests Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner
2022-12-19 14:48 ` Thomas Lamprecht
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 10/10] memory: fix hotplug with virtiomem && maxmem Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner
2022-12-16 13:38 ` [pve-devel] [PATCH qemu-server 00/10] rework memory hotplug + virtiomem Fiona Ebner
2022-12-19 11:31 ` DERUMIER, Alexandre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox