From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com, aderumier@odiso.com
Subject: Re: [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support
Date: Fri, 16 Dec 2022 14:42:15 +0100 [thread overview]
Message-ID: <87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com> (raw)
In-Reply-To: <20221209192726.1499142-9-aderumier@odiso.com>
Am 09.12.22 um 20:27 schrieb Alexandre Derumier:
> a 4GB static memory is needed for DMA+boot memory, as this memory
> is almost always un-unpluggeable.
>
> 1 virtio-mem pci device is setup for each numa node on pci.4 bridge
>
> virtio-mem use a fixed blocksize with 32000 blocks
> Blocksize is computed from the maxmemory-4096/32000 with a minimum of
> 2MB to map THP.
> (lower blocksize = more chance to unplug memory).
>
> fixes:
> https://bugzilla.proxmox.com/show_bug.cgi?id=931
Comment 7 talks about Windows, and virtio-mem is not supported there at
the moment, so I don't think we should consider it fixed ;)
> https://bugzilla.proxmox.com/show_bug.cgi?id=2949
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
> PVE/QemuServer.pm | 8 +++-
> PVE/QemuServer/Memory.pm | 98 +++++++++++++++++++++++++++++++++++++---
> PVE/QemuServer/PCI.pm | 8 ++++
> 3 files changed, 106 insertions(+), 8 deletions(-)
>
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 0d5b550..43fab29 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -285,6 +285,12 @@ my $memory_fmt = {
> optional => 1,
> enum => [@max_memory_list],
> },
> + virtio => {
> + description => "enable virtio-mem memory",
We really should mention that it's a technology preview, and that it
only works for Linux >=5.8 guests currently:
https://virtio-mem.gitlab.io/user-guide/#important-current-limitations
> + type => 'boolean',
> + optional => 1,
> + default => 0,
> + },
> };
>
> my $meta_info_fmt = {
> @@ -3898,7 +3904,7 @@ sub config_to_command {
> push @$cmd, get_cpu_options($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough);
> }
>
> - PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd);
> + PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $devices, $bridges, $arch, $machine_type);
>
> push @$cmd, '-S' if $conf->{freeze};
>
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 8bbbf07..70ab65a 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -8,6 +8,8 @@ use PVE::Exception qw(raise raise_param_exc);
>
> use PVE::QemuServer;
> use PVE::QemuServer::Monitor qw(mon_cmd);
> +use PVE::QemuServer::PCI qw(print_pci_addr);
> +
> use base qw(Exporter);
>
> our @EXPORT_OK = qw(
> @@ -27,7 +29,9 @@ my sub get_static_mem {
> my $static_memory = 0;
> my $memory = PVE::QemuServer::parse_memory($conf->{memory});
>
> - if($memory->{max}) {
> + if ($memory->{virtio}) {
> + $static_memory = 4096;
> + } elsif ($memory->{max}) {
> my $dimm_size = $memory->{max} / 64;
> #static mem can't be lower than 4G and lower than 1 dimmsize by socket
> $static_memory = $dimm_size * $sockets;
> @@ -102,6 +106,24 @@ my sub get_max_mem {
> return $cpu_max_mem;
> }
>
> +my sub get_virtiomem_block_size {
> + my ($conf) = @_;
> +
> + my $MAX_MEM = get_max_mem($conf);
> + my $static_memory = get_static_mem($conf);
> + my $memory = get_current_memory($conf);
> + #virtiomem can map 32000 block size. try to use lowerst blocksize, lower = more chance to unplug memory.
Style nit: line too long
> + my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> + #round next power of 2
> + $blocksize = 2**(int(log($blocksize)/log(2))+1);
What if log($blocksize)/log(2) is exactly an integer? Do we still want
to add 1 then? If not, please use ceil(...) instead of int(...)+1. Well,
I guess it can't happen in practice by what values are possible for
$MAX_MEM and $static_memory, but still.
> + #2MB is the minimum to be aligned with THP
> + $blocksize = 2 if $blocksize < 2;
Nit: $blocksize is at least 2**1 after the previous caluclation, so this
isn't really needed.
> +
> + die "memory size need to be multiple of $blocksize MB when virtio-mem is enabled" if ($memory % $blocksize != 0);
Missing newline in error message.
Style nit: line too long
> +
> + return $blocksize;
> +}
> +
> sub get_current_memory{
> my ($conf) = @_;
>
> @@ -224,7 +246,41 @@ sub qemu_memory_hotplug {
> my $MAX_MEM = get_max_mem($conf);
> die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $value > $MAX_MEM;
>
> - if ($value > $memory) {
> + my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
> +
> + if ($confmem->{virtio}) {
> + my $blocksize = get_virtiomem_block_size($conf);
> + my $requested_size = ($value - $static_memory) / $sockets * 1024 * 1024;
> + my $totalsize = $static_memory;
> + my $err = undef;
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> +
> + my $id = "virtiomem$i";
> + my $retry = 0;
> + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/$id", property => "requested-size", value => int($requested_size));
I'd eval the mon_cmd's and also catch errors there.
> +
> + my $size = 0;
> + while (1) {
> + sleep 1;
If there really is no good alternative to this querying loop, I'd rather
issue the qom-set command for all virtio-mem devices first, and then do
the loop. Maybe also move the sleep to the end of the loop.
> + $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/$id", property => "size");
> + $err = 1 if $retry > 5;
> + last if $size eq $requested_size || $retry > 5;
I think, $requested_size doesn't have to be a multiple of $sockets, so
this should rather be int($requested_size), which is what you set above.
> + $retry++;
> + }
> + $totalsize += ($size / 1024 / 1024 );
> + }
> + #update conf after each succesfull change
> + if($err) {
But this is only done in the error case, not after each successful change.
> + my $mem = { max => $MAX_MEM, virtio => 1};
> + $mem->{current} = $totalsize;
Nit: int($totalsize) just to be sure?
> + $conf->{memory} = PVE::QemuServer::print_memory($mem);
> + PVE::QemuConfig->write_config($vmid, $conf);
> + raise_param_exc({ 'memory' => "error modify virtio memory" }) if $err;
It's not necessarily a parameter issue, please use die instead.
> + }
> + return $totalsize;
The other branches don't (explicitly) return anything.
> +
> + } elsif ($value > $memory) {
>
> my $numa_hostmap;
>
> @@ -324,14 +380,15 @@ sub qemu_dimm_list {
> }
>
> sub config {
> - my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd) = @_;
> + my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $devices, $bridges, $arch, $machine_type) = @_;
>
> my $memory = get_current_memory($conf);
>
> my $static_memory = get_static_mem($conf);
> +
> my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
>
> - if ($hotplug_features->{memory} || defined($confmem->{max})) {
> + if ($hotplug_features->{memory} || defined($confmem->{max}) || defined($confmem->{virtio})) {
Again, should we even bother attaching the devices if hotplug is not
enabled?
> die "NUMA needs to be enabled for memory hotplug\n" if !$conf->{numa};
> my $MAX_MEM = get_max_mem($conf);
> die "Total memory is bigger than ${MAX_MEM}MB\n" if $memory > $MAX_MEM;
> @@ -342,8 +399,12 @@ sub config {
> }
>
> die "minimum memory must be ${static_memory}MB\n" if($memory < $static_memory);
> +
> + my $cmdstr = "size=${static_memory}";
> my $slots = $confmem->{max} ? 64 : 255;
> - push @$cmd, '-m', "size=${static_memory},slots=$slots,maxmem=${MAX_MEM}M";
> + $cmdstr .= ",slots=$slots" if !$confmem->{'virtio'};
> + $cmdstr .= ",maxmem=${MAX_MEM}M";
> + push @$cmd, '-m', $cmdstr;
>
> } else {
> push @$cmd, '-m', $static_memory;
> @@ -412,7 +473,26 @@ sub config {
> }
> }
>
> - if ($hotplug_features->{memory} || $confmem->{max}) {
> + if ($confmem->{'virtio'}) {
> + my $MAX_MEM = get_max_mem($conf);
> + my $node_maxmem = ($MAX_MEM - $static_memory) / $sockets;
> + my $node_mem = ($memory - $static_memory) / $sockets;
> + my $blocksize = get_virtiomem_block_size($conf);
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> +
> + my $id = "virtiomem$i";
> + my $mem_object = print_mem_object($conf, "mem-$id", $node_maxmem);
> + push @$cmd, "-object" , "$mem_object,reserve=off";
> +
> + my $pciaddr = print_pci_addr($id, $bridges, $arch, $machine_type);
Can we rather handle the PCI address printing in config_to_command() and
only pass in the addresses here? That would also avoid the PCI module
usage and the new "one-time usage" parameters passed to Memory::config().
Maybe have a small helper in here, that just returns the needed $ids
depending on the config. Then in config_to_command(), call that helper,
print the PCI addresses, then call Memory::config(..., { $id1 =>
$address1, $id2 => $adress2, ... }).
Might also not be the nicest, but at least be a little less cluttering
IMHO. But feel free to come up with something better or keep it as-is if
you really want to ;)
> + my $mem_device = "virtio-mem-pci,block-size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-$id,node=$i$pciaddr";
> + $mem_device .= ",prealloc=on" if $conf->{hugepages};
So prealloc=on for the device, but not prealloc=yes for the object
below[0]. Would you mind explaining it to me? I just found the part
mentioned for v7.0 here
https://virtio-mem.gitlab.io/user-guide/user-guide-qemu.html#updates
> + push @$devices, "-device", $mem_device;
The dimm devices in the other branch are not pushed onto $devices, so
this feels inconsistent. Why not add it onto $cmd too?
> + }
> +
> + } elsif ($hotplug_features->{memory} || $confmem->{max}) {
> +
> foreach_dimm($conf, $vmid, $memory, $sockets, sub {
> my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
>
> @@ -430,12 +510,16 @@ sub config {
> sub print_mem_object {
> my ($conf, $id, $size) = @_;
>
> + my $confmem = PVE::QemuServer::parse_memory($conf->{memory});
> +
> if ($conf->{hugepages}) {
>
> my $hugepages_size = hugepages_size($conf, $size);
> my $path = hugepages_mount_path($hugepages_size);
>
> - return "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on,prealloc=yes";
> + my $object = "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on";
> + $object .= ",prealloc=yes" if !$confmem->{virtio};
[0]
> + return $object;
> } else {
> return "memory-backend-ram,id=$id,size=${size}M";
> }
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index a18b974..0187c74 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -249,6 +249,14 @@ sub get_pci_addr_map {
> 'scsihw2' => { bus => 4, addr => 1 },
> 'scsihw3' => { bus => 4, addr => 2 },
> 'scsihw4' => { bus => 4, addr => 3 },
> + 'virtiomem0' => { bus => 4, addr => 4 },
> + 'virtiomem1' => { bus => 4, addr => 5 },
> + 'virtiomem2' => { bus => 4, addr => 6 },
> + 'virtiomem3' => { bus => 4, addr => 7 },
> + 'virtiomem4' => { bus => 4, addr => 8 },
> + 'virtiomem5' => { bus => 4, addr => 9 },
> + 'virtiomem6' => { bus => 4, addr => 10 },
> + 'virtiomem7' => { bus => 4, addr => 11 },
What if $conf->{sockets} > 8? Maybe mention the limitation in the
description of the 'virtio' property in the 'memory' string. Is the plan
to just add on more virtiomem PCI devices in the future?
> } if !defined($pci_addr_map);
> return $pci_addr_map;
> }
next prev parent reply other threads:[~2022-12-16 13:42 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-09 19:27 [pve-devel] [PATCH qemu-server 00/10] rework memory hotplug + virtiomem Alexandre Derumier
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 01/10] test: add memory tests Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 02/10] add memory parser Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2023-01-02 10:50 ` DERUMIER, Alexandre
2023-01-05 12:47 ` Fiona Ebner
2023-01-02 11:23 ` DERUMIER, Alexandre
2023-01-05 12:48 ` Fiona Ebner
[not found] ` <4ba723fb986517054761eb65f38812fac86a895b.camel@groupe-cyllene.com>
2023-01-09 14:35 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 03/10] config: memory: add 'max' option Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 04/10] memory: add get_static_mem Alexandre Derumier
2022-12-16 13:38 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 05/10] memory: get_max_mem: use config memory max Alexandre Derumier
2022-12-16 13:39 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 06/10] memory: use 64 slots && static dimm size with max is defined Alexandre Derumier
2022-12-16 13:39 ` Fiona Ebner
2022-12-19 12:05 ` DERUMIER, Alexandre
2022-12-19 12:28 ` Fiona Ebner
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 07/10] test: add memory-max tests Alexandre Derumier
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner [this message]
[not found] ` <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com>
2022-12-20 10:26 ` Fiona Ebner
2022-12-20 12:16 ` [PVE-User] " DERUMIER, Alexandre
2022-12-20 12:31 ` DERUMIER, Alexandre
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 09/10] tests: add virtio-mem tests Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner
2022-12-19 14:48 ` Thomas Lamprecht
2022-12-09 19:27 ` [pve-devel] [PATCH qemu-server 10/10] memory: fix hotplug with virtiomem && maxmem Alexandre Derumier
2022-12-16 13:42 ` Fiona Ebner
2022-12-16 13:38 ` [pve-devel] [PATCH qemu-server 00/10] rework memory hotplug + virtiomem Fiona Ebner
2022-12-19 11:31 ` DERUMIER, Alexandre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.