From: Fiona Ebner <f.ebner@proxmox.com>
To: pve-devel@lists.proxmox.com, "aderumier@odiso.com" <aderumier@odiso.com>
Subject: Re: [pve-devel] [PATCH v3 qemu-server 11/13] memory: add virtio-mem support
Date: Fri, 3 Feb 2023 14:46:36 +0100 [thread overview]
Message-ID: <f9758e76-2a9c-cef9-412e-1b85a5d70c8b@proxmox.com> (raw)
In-Reply-To: <20230202110344.840195-12-aderumier@odiso.com>
Am 02.02.23 um 12:03 schrieb Alexandre Derumier:
> @@ -157,6 +168,109 @@ sub get_current_memory {
> return $memory->{current};
> }
>
> +sub get_virtiomem_block_size {
> + my ($conf) = @_;
> +
> + my $sockets = $conf->{sockets} || 1;
> + my $MAX_MEM = get_max_mem($conf);
> + my $static_memory = get_static_mem($conf, $sockets);
> + my $memory = get_current_memory($conf->{memory});
> +
> + #virtiomem can map 32000 block size.
> + #try to use lowest blocksize, lower = more chance to unplug memory.
> + my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> + #2MB is the minimum to be aligned with THP
> + $blocksize = 2 if $blocksize < 2;
> + $blocksize = 2**(ceil(log($blocksize)/log(2)));
> + #Linux guest kernel only support 4MiB block currently (kernel <= 6.2)
> + $blocksize = 4 if $blocksize < 4;
> +
> + return $blocksize;
> +}
> +
> +my sub get_virtiomem_total_current_size {
> + my ($mems) = @_;
> + my $size = 0;
> + for my $mem (values %$mems) {
> + $size += $mem->{current};
> + }
> + return $size;
> +}
> +
> +my sub balance_virtiomem {
The function should die if not all memory can be plugged.
> + my ($vmid, $virtiomems, $blocksize, $target_virtiomem_total) = @_;
> +
> + my $nb_virtiomem = keys %$virtiomems;
Style nit: explicit scalar() would be nice.
> +
> + print"try to balance memory on $nb_virtiomem virtiomems\n";
> + die "error. no more available blocks in virtiomem to balance the remaining memory" if $target_virtiomem_total < 0;
> +
> + #if we can't share exactly the same amount, we add the remainder on last node
> + my $virtiomem_target_aligned = int( $target_virtiomem_total / $nb_virtiomem / $blocksize) * $blocksize;
> + my $virtiomem_target_remaining = $target_virtiomem_total - ($virtiomem_target_aligned * ($nb_virtiomem-1));
> +
> + my $i = 0;
> + foreach my $id (sort keys %$virtiomems) {
> + my $virtiomem = $virtiomems->{$id};
> + $i++;
> + my $virtiomem_target = $i == $nb_virtiomem ? $virtiomem_target_remaining : $virtiomem_target_aligned;
> + $virtiomem->{completed} = 0;
> + $virtiomem->{retry} = 0;
> + $virtiomem->{target} = $virtiomem_target;
> +
> + print "virtiomem$id: set-requested-size : $virtiomem_target\n";
> + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem_target * 1024 * 1024);
Style nit: some lines are over 100 chars and some quite a bit
> + }
> +
> + while (1) {
> +
> + sleep 1;
> + my $total_finished = 0;
> +
> + foreach my $id (keys %$virtiomems) {
> +
> + my $virtiomem = $virtiomems->{$id};
> +
> + if ($virtiomem->{error} || $virtiomem->{completed}) {
> + $total_finished++;
> + next;
> + }
> +
> + my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$id", property => "size");
> + $virtiomem->{current} = $size / 1024 / 1024;
> + print"virtiomem$id: virtiomem->last: $virtiomem->{last} virtiomem->current: $virtiomem->{current} virtio_mem_target:$virtiomem->{target}\n";
> +
> + if($virtiomem->{current} == $virtiomem->{target}) {
> + print"virtiomem$id: completed\n";
> + $virtiomem->{completed} = 1;
> + next;
> + }
> +
> + if($virtiomem->{current} != $virtiomem->{last}) {
> + #if value has changed, but not yet completed
> + print "virtiomem$id: changed but don't not reach target yet\n";
> + $virtiomem->{retry} = 0;
> + $virtiomem->{last} = $virtiomem->{current};
> + next;
> + }
> +
> + if($virtiomem->{retry} >= 5) {
> + print "virtiomem$id: too many retry. set error\n";
> + $virtiomem->{error} = 1;
> + #as change is async, we don't want that value change after the api call
> + eval {
> + mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem->{current} * 1024 *1024);
> + };
> + }
> + print"virtiomem$id: increase retry: $virtiomem->{retry}\n";
> + $virtiomem->{retry}++;
> + }
> +
> + my $nb_virtiomem = keys %$virtiomems;
Redeclares variable. The number can't change from before, or am I
missing something?
> + return if $total_finished == $nb_virtiomem;
Style nit: could also use
while ($total_finished != $nb_virtiomem)
at the beginning of the loop.
> + }
> +}
> +
> sub get_numa_node_list {
> my ($conf) = @_;
> my @numa_map;
> @@ -237,7 +351,39 @@ sub qemu_memory_hotplug {
> my $MAX_MEM = get_max_mem($conf);
> die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $value > $MAX_MEM;
>
> - if ($value > $memory) {
> + my $confmem = parse_memory($conf->{memory});
You can just re-use the existing $oldmem?
> +
> + if ($confmem->{virtio}) {
> + my $blocksize = get_virtiomem_block_size($conf);
> +
> + my $virtiomems = {};
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> + my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$i", property => "size");
> + $size = $size / 1024 /1024;
> + $virtiomems->{$i} = {
> + current => $size,
> + last => $size,
> + error => 0,
> + completed => 0,
> + retry => 0
> + };
> + }
> +
> + my $target_virtiomem_total = $value - $static_memory;
> + my $err;
> + eval {
> + balance_virtiomem($vmid, $virtiomems, $blocksize, $target_virtiomem_total);
> + };
> + $err = $@ if $@;
> +
> + my $current_memory = $static_memory + get_virtiomem_total_current_size($virtiomems);
> + $newmem->{current} = $current_memory;
> + $conf->{memory} = print_memory($newmem);
> + PVE::QemuConfig->write_config($vmid, $conf);
> + die $err if $err;
> +
> + } elsif ($value > $memory) {
>
> my $numa_hostmap;
>
> @@ -441,17 +590,42 @@ sub config {
> }
>
> if ($hotplug) {
> - foreach_dimm($conf, $vmid, $memory, $static_memory, sub {
> - my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
>
> - my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size);
> + my $confmem = parse_memory($conf->{memory});
> +
> + if ($confmem->{'virtio'}) {
>
> - push @$cmd, "-object" , $mem_object;
> - push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode";
> + my $MAX_MEM = get_max_mem($conf);
> + my $node_maxmem = ($MAX_MEM - $static_memory) / $sockets;
> + my $node_mem = ($memory - $static_memory) / $sockets;
Nit: If the number of $sockets is not a power of 2, I think this breaks.
But I guess we already don't support it. Running current version without
your patches (for a VM with memory hotplug):
root@pve701 ~ # qm set 131 --sockets 3
update VM 131: -sockets 3
root@pve701 ~ # qm set 131 -memory 8192
update VM 131: -memory 8192
root@pve701 ~ # qm start 131
kvm: total memory for NUMA nodes (0x3fffffff) should equal RAM size
(0x40000000)
start failed: QEMU exited with code 1
I guess we can just fix it up together with the existing rounding issue
if/when somebody complains about it ;)
> + my $blocksize = get_virtiomem_block_size($conf);
>
> - die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n"
> - if $current_size > $memory;
> - });
> + die "memory need to be a multiple of $blocksize MiB with maxmemory $MAX_MEM MiB when virtiomem is enabled\n"
> + if $memory % $blocksize != 0;
> +
> + for (my $i = 0; $i < $sockets; $i++) {
> +
> + my $id = "virtiomem$i";
> + my $mem_object = print_mem_object($conf, "mem-$id", $node_maxmem);
> + push @$cmd, "-object" , "$mem_object,reserve=off";
> +
> + my $mem_device = "virtio-mem-pci,block-size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-$id,node=$i";
> + $mem_device .= ",prealloc=on" if $conf->{hugepages};
> + $mem_devices->{$id} = $mem_device;
> + }
> + } else {
> + foreach_dimm($conf, $vmid, $memory, $static_memory, sub {
> + my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
> +
> + my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size);
> +
> + push @$cmd, "-object" , $mem_object;
> + push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode";
> +
> + die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n"
> + if $current_size > $memory;
> + });
> + }
> }
> }
>
next prev parent reply other threads:[~2023-02-03 13:46 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-02 11:03 [pve-devel] [PATCH v3 qemu-server 00/13] rework memory hotplug + virtiomem Alexandre Derumier
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 01/13] memory: extract some code to their own sub for mocking Alexandre Derumier
2023-02-03 13:44 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 02/13] tests: add memory tests Alexandre Derumier
2023-02-03 13:44 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 03/13] qemu_memory_hotplug: remove unused $opt arg Alexandre Derumier
2023-02-03 13:56 ` [pve-devel] applied: " Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 04/13] add memory parser Alexandre Derumier
2023-02-03 13:44 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 05/13] memory: add get_static_mem && remove parse_hotplug_features Alexandre Derumier
2023-02-03 13:44 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 06/13] config: memory: add 'max' option Alexandre Derumier
2023-02-03 13:44 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 07/13] memory: get_max_mem: use config memory max Alexandre Derumier
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 08/13] memory: don't use foreach_reversedimm for unplug Alexandre Derumier
2023-02-03 13:45 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 09/13] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
2023-02-03 13:45 ` Fiona Ebner
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 10/13] test: add memory-max tests Alexandre Derumier
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 11/13] memory: add virtio-mem support Alexandre Derumier
2023-02-03 13:46 ` Fiona Ebner [this message]
2023-02-03 15:48 ` DERUMIER, Alexandre
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 12/13] memory: virtio-mem : implement redispatch retry Alexandre Derumier
2023-02-02 11:03 ` [pve-devel] [PATCH v3 qemu-server 13/13] tests: add virtio-mem tests Alexandre Derumier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f9758e76-2a9c-cef9-412e-1b85a5d70c8b@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal