public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem
@ 2023-01-04  6:42 Alexandre Derumier
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

This patch series rework the current memory hotplug + virtiomem.

memory option now have extra options:

memory: [[current=]<integer>] [,max=<enum>] [,virtio=<1|0>]
ex: memory: current=1024,max=131072,virtio=1


patches 1-2: add a memory parser

patches 3-7: add the max option with 64 static dimm hotplug

for classic memory hotplug, when maxmemory is defined,
we use 64 fixed size dimm.
The max option is a multiple of 64GB.

patches 8-9: add virtio-mem

The virtio option enable new virtio-mem support,
instead of plugging dimm, it's add/removed block inside
big dimm.
virtio-mem can use 32000 blocks, the blocksize is compute from
max memory.


Changelog v2:

update differents patches based on Fiona comments.
(I have send 2 others mails with comments not yet addressed)

Biggest change is on virtio-mem, instead of trying to have same amount of memory
on each virtiomem (some block could be unmovable and break unplug),
we try to balance/dispatch remaining block on other available virtiomems.

Also, the minimum blocksize supported by linux guest os is 4MB currently,
even if virtiomem can use 2MB on qemu side.

Patch10 with hotplug fix has be merged in others patches.


Alexandre Derumier (9):
  test: add memory tests
  add memory parser
  memory: add get_static_mem
  config: memory: add 'max' option
  memory: get_max_mem: use config memory max
  memory: use 64 slots && static dimm size when max is defined
  test: add memory-max tests
  memory: add virtio-mem support
  tests: add virtio-mem tests

 PVE/API2/Qemu.pm                              |  46 +-
 PVE/QemuConfig.pm                             |   4 +-
 PVE/QemuMigrate.pm                            |   6 +-
 PVE/QemuServer.pm                             |  31 +-
 PVE/QemuServer/Helpers.pm                     |   3 +-
 PVE/QemuServer/Memory.pm                      | 424 +++++++++++++++---
 PVE/QemuServer/PCI.pm                         |   8 +
 test/cfg2cmd/memory-hotplug-hugepages.conf    |  12 +
 .../cfg2cmd/memory-hotplug-hugepages.conf.cmd |  62 +++
 test/cfg2cmd/memory-hotplug.conf              |  11 +
 test/cfg2cmd/memory-hotplug.conf.cmd          | 174 +++++++
 test/cfg2cmd/memory-hugepages-1g.conf         |  11 +
 test/cfg2cmd/memory-hugepages-1g.conf.cmd     |  30 ++
 test/cfg2cmd/memory-hugepages-2m.conf         |  11 +
 test/cfg2cmd/memory-hugepages-2m.conf.cmd     |  30 ++
 test/cfg2cmd/memory-max-128G.conf             |  11 +
 test/cfg2cmd/memory-max-128G.conf.cmd         |  86 ++++
 test/cfg2cmd/memory-max-512G.conf             |  11 +
 test/cfg2cmd/memory-max-512G.conf.cmd         |  58 +++
 test/cfg2cmd/memory-virtio-hugepages-1G.conf  |  12 +
 .../memory-virtio-hugepages-1G.conf.cmd       |  35 ++
 test/cfg2cmd/memory-virtio-max.conf           |  11 +
 test/cfg2cmd/memory-virtio-max.conf.cmd       |  35 ++
 test/cfg2cmd/memory-virtio.conf               |  11 +
 test/cfg2cmd/memory-virtio.conf.cmd           |  35 ++
 test/run_config2command_tests.pl              |  21 +
 26 files changed, 1107 insertions(+), 82 deletions(-)
 create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf
 create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hotplug.conf
 create mode 100644 test/cfg2cmd/memory-hotplug.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf
 create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf
 create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf.cmd
 create mode 100644 test/cfg2cmd/memory-max-128G.conf
 create mode 100644 test/cfg2cmd/memory-max-128G.conf.cmd
 create mode 100644 test/cfg2cmd/memory-max-512G.conf
 create mode 100644 test/cfg2cmd/memory-max-512G.conf.cmd
 create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf
 create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd
 create mode 100644 test/cfg2cmd/memory-virtio-max.conf
 create mode 100644 test/cfg2cmd/memory-virtio-max.conf.cmd
 create mode 100644 test/cfg2cmd/memory-virtio.conf
 create mode 100644 test/cfg2cmd/memory-virtio.conf.cmd

-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
@ 2023-01-04  6:42 ` Alexandre Derumier
  2023-01-24 13:04   ` Fiona Ebner
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser Alexandre Derumier
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/QemuServer/Memory.pm                      |  11 +-
 test/cfg2cmd/memory-hotplug-hugepages.conf    |  12 ++
 .../cfg2cmd/memory-hotplug-hugepages.conf.cmd |  62 +++++++
 test/cfg2cmd/memory-hotplug.conf              |  11 ++
 test/cfg2cmd/memory-hotplug.conf.cmd          | 174 ++++++++++++++++++
 test/cfg2cmd/memory-hugepages-1g.conf         |  11 ++
 test/cfg2cmd/memory-hugepages-1g.conf.cmd     |  30 +++
 test/cfg2cmd/memory-hugepages-2m.conf         |  11 ++
 test/cfg2cmd/memory-hugepages-2m.conf.cmd     |  30 +++
 test/run_config2command_tests.pl              |  21 +++
 10 files changed, 371 insertions(+), 2 deletions(-)
 create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf
 create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hotplug.conf
 create mode 100644 test/cfg2cmd/memory-hotplug.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf
 create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf.cmd
 create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf
 create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf.cmd

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index f8fc534..6c1cd94 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -348,7 +348,7 @@ sub config {
 	    my $numa_memory = ($static_memory / $sockets);
 
 	    for (my $i = 0; $i < $sockets; $i++)  {
-		die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/" && $conf->{hugepages};
+		die "host NUMA node$i doesn't exist\n" if !host_numanode_exist($i) && $conf->{hugepages};
 
 		my $mem_object = print_mem_object($conf, "ram-node$i", $numa_memory);
 		push @$cmd, '-object', $mem_object;
@@ -391,6 +391,13 @@ sub print_mem_object {
 
 }
 
+sub host_numanode_exist {
+    my ($id) = @_;
+
+    return if ! -d "/sys/devices/system/node/node$id/";
+    return 1;
+}
+
 sub print_numa_hostnodes {
     my ($hostnodelists) = @_;
 
@@ -402,7 +409,7 @@ sub print_numa_hostnodes {
 	$hostnodes .= "-$end" if defined($end);
 	$end //= $start;
 	for (my $i = $start; $i <= $end; ++$i ) {
-	    die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/";
+	    die "host NUMA node$i doesn't exist\n" if !host_numanode_exist($i);
 	}
     }
     return $hostnodes;
diff --git a/test/cfg2cmd/memory-hotplug-hugepages.conf b/test/cfg2cmd/memory-hotplug-hugepages.conf
new file mode 100644
index 0000000..6cba31e
--- /dev/null
+++ b/test/cfg2cmd/memory-hotplug-hugepages.conf
@@ -0,0 +1,12 @@
+# TEST: memory hotplug with 1GB hugepage
+# QEMU_VERSION: 3.0
+cores: 2
+memory: 18432
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
+hugepages: 1024
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-hotplug-hugepages.conf.cmd b/test/cfg2cmd/memory-hotplug-hugepages.conf.cmd
new file mode 100644
index 0000000..6d4d8e8
--- /dev/null
+++ b/test/cfg2cmd/memory-hotplug-hugepages.conf.cmd
@@ -0,0 +1,62 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=2048,slots=255,maxmem=524288M' \
+  -object 'memory-backend-file,id=ram-node0,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-file,id=ram-node1,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-file,id=mem-dimm0,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' \
+  -object 'memory-backend-file,id=mem-dimm1,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=1' \
+  -object 'memory-backend-file,id=mem-dimm2,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm2,memdev=mem-dimm2,node=0' \
+  -object 'memory-backend-file,id=mem-dimm3,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm3,memdev=mem-dimm3,node=1' \
+  -object 'memory-backend-file,id=mem-dimm4,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm4,memdev=mem-dimm4,node=0' \
+  -object 'memory-backend-file,id=mem-dimm5,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm5,memdev=mem-dimm5,node=1' \
+  -object 'memory-backend-file,id=mem-dimm6,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm6,memdev=mem-dimm6,node=0' \
+  -object 'memory-backend-file,id=mem-dimm7,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm7,memdev=mem-dimm7,node=1' \
+  -object 'memory-backend-file,id=mem-dimm8,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm8,memdev=mem-dimm8,node=0' \
+  -object 'memory-backend-file,id=mem-dimm9,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm9,memdev=mem-dimm9,node=1' \
+  -object 'memory-backend-file,id=mem-dimm10,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm10,memdev=mem-dimm10,node=0' \
+  -object 'memory-backend-file,id=mem-dimm11,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm11,memdev=mem-dimm11,node=1' \
+  -object 'memory-backend-file,id=mem-dimm12,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm12,memdev=mem-dimm12,node=0' \
+  -object 'memory-backend-file,id=mem-dimm13,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm13,memdev=mem-dimm13,node=1' \
+  -object 'memory-backend-file,id=mem-dimm14,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm14,memdev=mem-dimm14,node=0' \
+  -object 'memory-backend-file,id=mem-dimm15,size=1024M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -device 'pc-dimm,id=dimm15,memdev=mem-dimm15,node=1' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc'
diff --git a/test/cfg2cmd/memory-hotplug.conf b/test/cfg2cmd/memory-hotplug.conf
new file mode 100644
index 0000000..386e61f
--- /dev/null
+++ b/test/cfg2cmd/memory-hotplug.conf
@@ -0,0 +1,11 @@
+# TEST: basic memory hotplug
+# QEMU_VERSION: 3.0
+cores: 2
+memory: 66560
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
diff --git a/test/cfg2cmd/memory-hotplug.conf.cmd b/test/cfg2cmd/memory-hotplug.conf.cmd
new file mode 100644
index 0000000..2da7955
--- /dev/null
+++ b/test/cfg2cmd/memory-hotplug.conf.cmd
@@ -0,0 +1,174 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=1024,slots=255,maxmem=524288M' \
+  -object 'memory-backend-ram,id=ram-node0,size=512M' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-ram,id=ram-node1,size=512M' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-ram,id=mem-dimm0,size=512M' \
+  -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm1,size=512M' \
+  -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm2,size=512M' \
+  -device 'pc-dimm,id=dimm2,memdev=mem-dimm2,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm3,size=512M' \
+  -device 'pc-dimm,id=dimm3,memdev=mem-dimm3,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm4,size=512M' \
+  -device 'pc-dimm,id=dimm4,memdev=mem-dimm4,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm5,size=512M' \
+  -device 'pc-dimm,id=dimm5,memdev=mem-dimm5,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm6,size=512M' \
+  -device 'pc-dimm,id=dimm6,memdev=mem-dimm6,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm7,size=512M' \
+  -device 'pc-dimm,id=dimm7,memdev=mem-dimm7,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm8,size=512M' \
+  -device 'pc-dimm,id=dimm8,memdev=mem-dimm8,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm9,size=512M' \
+  -device 'pc-dimm,id=dimm9,memdev=mem-dimm9,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm10,size=512M' \
+  -device 'pc-dimm,id=dimm10,memdev=mem-dimm10,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm11,size=512M' \
+  -device 'pc-dimm,id=dimm11,memdev=mem-dimm11,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm12,size=512M' \
+  -device 'pc-dimm,id=dimm12,memdev=mem-dimm12,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm13,size=512M' \
+  -device 'pc-dimm,id=dimm13,memdev=mem-dimm13,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm14,size=512M' \
+  -device 'pc-dimm,id=dimm14,memdev=mem-dimm14,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm15,size=512M' \
+  -device 'pc-dimm,id=dimm15,memdev=mem-dimm15,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm16,size=512M' \
+  -device 'pc-dimm,id=dimm16,memdev=mem-dimm16,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm17,size=512M' \
+  -device 'pc-dimm,id=dimm17,memdev=mem-dimm17,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm18,size=512M' \
+  -device 'pc-dimm,id=dimm18,memdev=mem-dimm18,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm19,size=512M' \
+  -device 'pc-dimm,id=dimm19,memdev=mem-dimm19,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm20,size=512M' \
+  -device 'pc-dimm,id=dimm20,memdev=mem-dimm20,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm21,size=512M' \
+  -device 'pc-dimm,id=dimm21,memdev=mem-dimm21,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm22,size=512M' \
+  -device 'pc-dimm,id=dimm22,memdev=mem-dimm22,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm23,size=512M' \
+  -device 'pc-dimm,id=dimm23,memdev=mem-dimm23,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm24,size=512M' \
+  -device 'pc-dimm,id=dimm24,memdev=mem-dimm24,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm25,size=512M' \
+  -device 'pc-dimm,id=dimm25,memdev=mem-dimm25,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm26,size=512M' \
+  -device 'pc-dimm,id=dimm26,memdev=mem-dimm26,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm27,size=512M' \
+  -device 'pc-dimm,id=dimm27,memdev=mem-dimm27,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm28,size=512M' \
+  -device 'pc-dimm,id=dimm28,memdev=mem-dimm28,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm29,size=512M' \
+  -device 'pc-dimm,id=dimm29,memdev=mem-dimm29,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm30,size=512M' \
+  -device 'pc-dimm,id=dimm30,memdev=mem-dimm30,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm31,size=512M' \
+  -device 'pc-dimm,id=dimm31,memdev=mem-dimm31,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm32,size=1024M' \
+  -device 'pc-dimm,id=dimm32,memdev=mem-dimm32,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm33,size=1024M' \
+  -device 'pc-dimm,id=dimm33,memdev=mem-dimm33,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm34,size=1024M' \
+  -device 'pc-dimm,id=dimm34,memdev=mem-dimm34,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm35,size=1024M' \
+  -device 'pc-dimm,id=dimm35,memdev=mem-dimm35,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm36,size=1024M' \
+  -device 'pc-dimm,id=dimm36,memdev=mem-dimm36,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm37,size=1024M' \
+  -device 'pc-dimm,id=dimm37,memdev=mem-dimm37,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm38,size=1024M' \
+  -device 'pc-dimm,id=dimm38,memdev=mem-dimm38,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm39,size=1024M' \
+  -device 'pc-dimm,id=dimm39,memdev=mem-dimm39,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm40,size=1024M' \
+  -device 'pc-dimm,id=dimm40,memdev=mem-dimm40,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm41,size=1024M' \
+  -device 'pc-dimm,id=dimm41,memdev=mem-dimm41,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm42,size=1024M' \
+  -device 'pc-dimm,id=dimm42,memdev=mem-dimm42,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm43,size=1024M' \
+  -device 'pc-dimm,id=dimm43,memdev=mem-dimm43,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm44,size=1024M' \
+  -device 'pc-dimm,id=dimm44,memdev=mem-dimm44,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm45,size=1024M' \
+  -device 'pc-dimm,id=dimm45,memdev=mem-dimm45,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm46,size=1024M' \
+  -device 'pc-dimm,id=dimm46,memdev=mem-dimm46,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm47,size=1024M' \
+  -device 'pc-dimm,id=dimm47,memdev=mem-dimm47,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm48,size=1024M' \
+  -device 'pc-dimm,id=dimm48,memdev=mem-dimm48,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm49,size=1024M' \
+  -device 'pc-dimm,id=dimm49,memdev=mem-dimm49,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm50,size=1024M' \
+  -device 'pc-dimm,id=dimm50,memdev=mem-dimm50,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm51,size=1024M' \
+  -device 'pc-dimm,id=dimm51,memdev=mem-dimm51,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm52,size=1024M' \
+  -device 'pc-dimm,id=dimm52,memdev=mem-dimm52,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm53,size=1024M' \
+  -device 'pc-dimm,id=dimm53,memdev=mem-dimm53,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm54,size=1024M' \
+  -device 'pc-dimm,id=dimm54,memdev=mem-dimm54,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm55,size=1024M' \
+  -device 'pc-dimm,id=dimm55,memdev=mem-dimm55,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm56,size=1024M' \
+  -device 'pc-dimm,id=dimm56,memdev=mem-dimm56,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm57,size=1024M' \
+  -device 'pc-dimm,id=dimm57,memdev=mem-dimm57,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm58,size=1024M' \
+  -device 'pc-dimm,id=dimm58,memdev=mem-dimm58,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm59,size=1024M' \
+  -device 'pc-dimm,id=dimm59,memdev=mem-dimm59,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm60,size=1024M' \
+  -device 'pc-dimm,id=dimm60,memdev=mem-dimm60,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm61,size=1024M' \
+  -device 'pc-dimm,id=dimm61,memdev=mem-dimm61,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm62,size=1024M' \
+  -device 'pc-dimm,id=dimm62,memdev=mem-dimm62,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm63,size=1024M' \
+  -device 'pc-dimm,id=dimm63,memdev=mem-dimm63,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm64,size=2048M' \
+  -device 'pc-dimm,id=dimm64,memdev=mem-dimm64,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm65,size=2048M' \
+  -device 'pc-dimm,id=dimm65,memdev=mem-dimm65,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm66,size=2048M' \
+  -device 'pc-dimm,id=dimm66,memdev=mem-dimm66,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm67,size=2048M' \
+  -device 'pc-dimm,id=dimm67,memdev=mem-dimm67,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm68,size=2048M' \
+  -device 'pc-dimm,id=dimm68,memdev=mem-dimm68,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm69,size=2048M' \
+  -device 'pc-dimm,id=dimm69,memdev=mem-dimm69,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm70,size=2048M' \
+  -device 'pc-dimm,id=dimm70,memdev=mem-dimm70,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm71,size=2048M' \
+  -device 'pc-dimm,id=dimm71,memdev=mem-dimm71,node=1' \
+   -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc'
diff --git a/test/cfg2cmd/memory-hugepages-1g.conf b/test/cfg2cmd/memory-hugepages-1g.conf
new file mode 100644
index 0000000..8db2cca
--- /dev/null
+++ b/test/cfg2cmd/memory-hugepages-1g.conf
@@ -0,0 +1,11 @@
+# TEST: memory wih 1gb hugepages
+# QEMU_VERSION: 3.0
+cores: 2
+memory: 8192
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hugepages: 1024
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-hugepages-1g.conf.cmd b/test/cfg2cmd/memory-hugepages-1g.conf.cmd
new file mode 100644
index 0000000..63792d2
--- /dev/null
+++ b/test/cfg2cmd/memory-hugepages-1g.conf.cmd
@@ -0,0 +1,30 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 8192 \
+  -object 'memory-backend-file,id=ram-node0,size=4096M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-file,id=ram-node1,size=4096M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc'
diff --git a/test/cfg2cmd/memory-hugepages-2m.conf b/test/cfg2cmd/memory-hugepages-2m.conf
new file mode 100644
index 0000000..f0d65fb
--- /dev/null
+++ b/test/cfg2cmd/memory-hugepages-2m.conf
@@ -0,0 +1,11 @@
+# TEST: memory wih 2mb hugepages
+# QEMU_VERSION: 3.0
+cores: 2
+memory: 8192
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hugepages: 2
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-hugepages-2m.conf.cmd b/test/cfg2cmd/memory-hugepages-2m.conf.cmd
new file mode 100644
index 0000000..287c0ed
--- /dev/null
+++ b/test/cfg2cmd/memory-hugepages-2m.conf.cmd
@@ -0,0 +1,30 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 8192 \
+  -object 'memory-backend-file,id=ram-node0,size=4096M,mem-path=/run/hugepages/kvm/2048kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-file,id=ram-node1,size=4096M,mem-path=/run/hugepages/kvm/2048kB,share=on,prealloc=yes' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc'
diff --git a/test/run_config2command_tests.pl b/test/run_config2command_tests.pl
index f097811..9b49063 100755
--- a/test/run_config2command_tests.pl
+++ b/test/run_config2command_tests.pl
@@ -178,6 +178,27 @@ $qemu_server_config->mock(
     },
 );
 
+my $qemu_server_memory;
+$qemu_server_memory = Test::MockModule->new('PVE::QemuServer::Memory');
+$qemu_server_memory->mock(
+    hugepages_size => sub {
+	my ($conf, $size) = @_;
+
+	if ($conf->{hugepages} eq 'any') {
+	    return 1024;
+	} else {
+	    return $conf->{hugepages};
+	}
+    },
+    host_numanode_exist => sub {
+	my ($id) = @_;
+	return 1;
+    },
+    get_host_phys_address_bits => sub {
+	return 46;
+    }
+);
+
 my $pve_common_tools;
 $pve_common_tools = Test::MockModule->new('PVE::Tools');
 $pve_common_tools->mock(
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
@ 2023-01-04  6:42 ` Alexandre Derumier
  2023-01-24 13:04   ` Fiona Ebner
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem Alexandre Derumier
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/API2/Qemu.pm          | 12 ++++++-
 PVE/QemuConfig.pm         |  4 +--
 PVE/QemuMigrate.pm        |  6 ++--
 PVE/QemuServer.pm         | 24 ++++++-------
 PVE/QemuServer/Helpers.pm |  3 +-
 PVE/QemuServer/Memory.pm  | 74 ++++++++++++++++++++++++++++++++-------
 6 files changed, 91 insertions(+), 32 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index c87602d..4ffa973 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -32,6 +32,7 @@ use PVE::QemuServer::Drive;
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Machine;
+use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
 use PVE::AccessControl;
@@ -1608,7 +1609,16 @@ my $update_vm_api  = sub {
 	}
 
 	if ($param->{memory} || defined($param->{balloon})) {
-	    my $maxmem = $param->{memory} || $conf->{pending}->{memory} || $conf->{memory} || $defaults->{memory};
+
+	    my $maxmem = undef;
+	    if ($param->{memory}) {
+	        $maxmem = get_current_memory($param->{memory});
+	    } elsif ($conf->{pending}->{memory}) {
+	        $maxmem = get_current_memory($conf->{pending}->{memory});
+	    } else {
+	        $maxmem = get_current_memory($conf->{memory});
+	    }
+
 	    my $balloon = defined($param->{balloon}) ? $param->{balloon} : $conf->{pending}->{balloon} || $conf->{balloon};
 
 	    die "balloon value too large (must be smaller than assigned memory)\n"
diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
index 051382c..999e658 100644
--- a/PVE/QemuConfig.pm
+++ b/PVE/QemuConfig.pm
@@ -12,6 +12,7 @@ use PVE::QemuServer::Helpers;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer;
 use PVE::QemuServer::Machine;
+use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::Storage;
 use PVE::Tools;
 use PVE::Format qw(render_bytes render_duration);
@@ -208,8 +209,7 @@ sub __snapshot_save_vmstate {
 	$target = PVE::QemuServer::find_vmstate_storage($conf, $storecfg);
     }
 
-    my $defaults = PVE::QemuServer::load_defaults();
-    my $mem_size = $conf->{memory} // $defaults->{memory};
+    my $mem_size = get_current_memory($conf->{memory});
     my $driver_state_size = 500; # assume 500MB is enough to safe all driver state;
     # our savevm-start does live-save of the memory until the space left in the
     # volume is just enough for the remaining memory content + internal state
diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 5e466d9..2eccf67 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -26,6 +26,7 @@ use PVE::QemuServer::Drive;
 use PVE::QemuServer::Helpers qw(min_version);
 use PVE::QemuServer::Machine;
 use PVE::QemuServer::Monitor qw(mon_cmd);
+use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer;
 
 use PVE::AbstractMigrate;
@@ -1024,7 +1025,8 @@ sub phase2_start_remote_cluster {
     my $remote_vmid = $self->{opts}->{remote}->{vmid};
 
     # like regular start but with some overhead accounted for
-    my $timeout = PVE::QemuServer::Helpers::config_aware_timeout($self->{vmconf}) + 10;
+    my $memory = get_current_memory($self->{vmconf}->{memory});
+    my $timeout = PVE::QemuServer::Helpers::config_aware_timeout($self->{vmconf}, $memory) + 10;
 
     my $res = PVE::Tunnel::write_tunnel($self->{tunnel}, $timeout, "start", $params);
 
@@ -1179,7 +1181,7 @@ sub phase2 {
     $qemu_migrate_params->{'downtime-limit'} = int($migrate_downtime);
 
     # set cachesize to 10% of the total memory
-    my $memory =  $conf->{memory} || $defaults->{memory};
+    my $memory = get_current_memory($conf->{memory});
     my $cachesize = int($memory * 1048576 / 10);
     $cachesize = round_powerof2($cachesize);
 
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 39fc6b0..5847a78 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -52,7 +52,7 @@ use PVE::QemuServer::CGroup;
 use PVE::QemuServer::CPUConfig qw(print_cpu_device get_cpu_options);
 use PVE::QemuServer::Drive qw(is_valid_drivename drive_is_cloudinit drive_is_cdrom drive_is_read_only parse_drive print_drive);
 use PVE::QemuServer::Machine;
-use PVE::QemuServer::Memory;
+use PVE::QemuServer::Memory qw(get_current_memory);
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::PCI qw(print_pci_addr print_pcie_addr print_pcie_root_port parse_hostpci);
 use PVE::QemuServer::USB qw(parse_usb_device);
@@ -340,11 +340,8 @@ my $confdesc = {
     },
     memory => {
 	optional => 1,
-	type => 'integer',
-	description => "Amount of RAM for the VM in MB. This is the maximum available memory when"
-	    ." you use the balloon device.",
-	minimum => 16,
-	default => 512,
+	type => 'string',
+	format => $PVE::QemuServer::Memory::memory_fmt
     },
     balloon => {
 	optional => 1,
@@ -2928,8 +2925,7 @@ sub vmstatus {
 	$d->{cpus} = $conf->{vcpus} if $conf->{vcpus};
 
 	$d->{name} = $conf->{name} || "VM $vmid";
-	$d->{maxmem} = $conf->{memory} ? $conf->{memory}*(1024*1024)
-	    : $defaults->{memory}*(1024*1024);
+	$d->{maxmem} = get_current_memory($conf->{memory})*(1024*1024);
 
 	if ($conf->{balloon}) {
 	    $d->{balloon_min} = $conf->{balloon}*(1024*1024);
@@ -5025,7 +5021,7 @@ sub vmconfig_hotplug_pending {
 		# enable balloon device is not hotpluggable
 		die "skip\n" if defined($conf->{balloon}) && $conf->{balloon} == 0;
 		# here we reset the ballooning value to memory
-		my $balloon = $conf->{memory} || $defaults->{memory};
+		my $balloon = get_current_memory($conf->{memory});
 		mon_cmd($vmid, "balloon", value => $balloon*1024*1024);
 	    } elsif ($fast_plug_option->{$opt}) {
 		# do nothing
@@ -5038,7 +5034,7 @@ sub vmconfig_hotplug_pending {
 		vmconfig_delete_or_detach_drive($vmid, $storecfg, $conf, $opt, $force);
 	    } elsif ($opt =~ m/^memory$/) {
 		die "skip\n" if !$hotplug_features->{memory};
-		PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults, $opt);
+		PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults);
 	    } elsif ($opt eq 'cpuunits') {
 		$cgroup->change_cpu_shares(undef);
 	    } elsif ($opt eq 'cpulimit') {
@@ -5093,7 +5089,8 @@ sub vmconfig_hotplug_pending {
 
 		# allow manual ballooning if shares is set to zero
 		if ((defined($conf->{shares}) && ($conf->{shares} == 0))) {
-		    my $balloon = $conf->{pending}->{balloon} || $conf->{memory} || $defaults->{memory};
+		    my $memory = get_current_memory($conf->{memory});
+		    my $balloon = $conf->{pending}->{balloon} || $memory;
 		    mon_cmd($vmid, "balloon", value => $balloon*1024*1024);
 		}
 	    } elsif ($opt =~ m/^net(\d+)$/) {
@@ -5113,7 +5110,7 @@ sub vmconfig_hotplug_pending {
 				     $vmid, $opt, $value, $arch, $machine_type);
 	    } elsif ($opt =~ m/^memory$/) { #dimms
 		die "skip\n" if !$hotplug_features->{memory};
-		$value = PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults, $opt, $value);
+		PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults, $conf->{pending}->{$opt});
 	    } elsif ($opt eq 'cpuunits') {
 		my $new_cpuunits = PVE::CGroup::clamp_cpu_shares($conf->{pending}->{$opt}); #clamp
 		$cgroup->change_cpu_shares($new_cpuunits);
@@ -5790,7 +5787,8 @@ sub vm_start_nolock {
 	push @$cmd, '-S';
     }
 
-    my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $resume);
+    my $memory = get_current_memory($conf->{memory});
+    my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $memory, $resume);
 
     my $pci_devices = {}; # host pci devices
     for (my $i = 0; $i < $PVE::QemuServer::PCI::MAX_HOSTPCI_DEVICES; $i++)  {
diff --git a/PVE/QemuServer/Helpers.pm b/PVE/QemuServer/Helpers.pm
index e91f906..9115d50 100644
--- a/PVE/QemuServer/Helpers.pm
+++ b/PVE/QemuServer/Helpers.pm
@@ -143,8 +143,7 @@ sub version_cmp {
 }
 
 sub config_aware_timeout {
-    my ($config, $is_suspended) = @_;
-    my $memory = $config->{memory};
+    my ($config, $memory, $is_suspended) = @_;
     my $timeout = 30;
 
     # Based on user reported startup time for vm with 512GiB @ 4-5 minutes
diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index 6c1cd94..59e51c8 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -8,10 +8,45 @@ use PVE::Exception qw(raise raise_param_exc);
 
 use PVE::QemuServer;
 use PVE::QemuServer::Monitor qw(mon_cmd);
+use base qw(Exporter);
+
+our @EXPORT_OK = qw(
+get_current_memory
+);
 
 my $MAX_NUMA = 8;
 my $STATICMEM = 1024;
 
+my $memory_fmt = {
+    current => {
+	description => "Current amount of online RAM for the VM in MB. This is the maximum available memory when"
+	    ." you use the balloon device.",
+	type => 'integer',
+	default_key => 1,
+	optional => 1,
+	minimum => 16,
+	default => 512,
+    }
+};
+
+sub print_memory {
+    my $memory = shift;
+
+    return PVE::JSONSchema::print_property_string($memory, $memory_fmt);
+}
+
+sub parse_memory {
+    my ($value) = @_;
+
+    my $current_default = $memory_fmt->{current}->{default};
+    my $res = { current => $current_default };
+    return $res if !defined($value);
+
+    $res = eval { PVE::JSONSchema::parse_property_string($memory_fmt, $value) };
+    die $@ if $@;
+    return $res;
+}
+
 my $_host_bits;
 my sub get_host_phys_address_bits {
     return $_host_bits if defined($_host_bits);
@@ -63,6 +98,13 @@ my sub get_max_mem {
     return $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
 }
 
+sub get_current_memory {
+    my ($value) = @_;
+
+    my $memory = parse_memory($value);
+    return $memory->{current};
+}
+
 sub get_numa_node_list {
     my ($conf) = @_;
     my @numa_map;
@@ -155,17 +197,21 @@ sub foreach_reverse_dimm {
 }
 
 sub qemu_memory_hotplug {
-    my ($vmid, $conf, $defaults, $opt, $value) = @_;
+    my ($vmid, $conf, $defaults, $value) = @_;
+
+    return if !PVE::QemuServer::check_running($vmid);
 
-    return $value if !PVE::QemuServer::check_running($vmid);
+    my $oldmem = parse_memory($conf->{memory});
+    my $newmem = parse_memory($value);
+
+    my $memory = $oldmem->{current};
+    $value = $newmem->{current};
+
+    return if $value == $memory;
 
     my $sockets = 1;
     $sockets = $conf->{sockets} if $conf->{sockets};
 
-    my $memory = $conf->{memory} || $defaults->{memory};
-    $value = $defaults->{memory} if !$value;
-    return $value if $value == $memory;
-
     my $static_memory = $STATICMEM;
     $static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
 
@@ -180,7 +226,7 @@ sub qemu_memory_hotplug {
 	foreach_dimm($conf, $vmid, $value, $sockets, sub {
 	    my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
 
-		return if $current_size <= $conf->{memory};
+		return if $current_size <= get_current_memory($conf->{memory});
 
 		if ($conf->{hugepages}) {
 		    $numa_hostmap = get_numa_guest_to_host_map($conf) if !$numa_hostmap;
@@ -219,7 +265,8 @@ sub qemu_memory_hotplug {
 		    die $err;
 		}
 		#update conf after each succesful module hotplug
-		$conf->{memory} = $current_size;
+		$newmem->{current} = $current_size;
+		$conf->{memory} = print_memory($newmem);
 		PVE::QemuConfig->write_config($vmid, $conf);
 	});
 
@@ -228,7 +275,8 @@ sub qemu_memory_hotplug {
 	foreach_reverse_dimm($conf, $vmid, $value, $sockets, sub {
 	    my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
 
-		return if $current_size >= $conf->{memory};
+		return if $current_size >= get_current_memory($conf->{memory});
+
 		print "try to unplug memory dimm $name\n";
 
 		my $retry = 0;
@@ -242,7 +290,8 @@ sub qemu_memory_hotplug {
 		}
 
 		#update conf after each succesful module unplug
-		$conf->{memory} = $current_size;
+		$newmem->{current} = $current_size;
+		$conf->{memory} = print_memory($newmem);
 
 		eval { PVE::QemuServer::qemu_objectdel($vmid, "mem-$name"); };
 		PVE::QemuConfig->write_config($vmid, $conf);
@@ -270,7 +319,8 @@ sub qemu_dimm_list {
 sub config {
     my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd) = @_;
 
-    my $memory = $conf->{memory} || $defaults->{memory};
+    my $memory = get_current_memory($conf->{memory});
+
     my $static_memory = 0;
 
     if ($hotplug_features->{memory}) {
@@ -493,7 +543,7 @@ sub hugepages_topology {
     return if !$conf->{numa};
 
     my $defaults = PVE::QemuServer::load_defaults();
-    my $memory = $conf->{memory} || $defaults->{memory};
+    my $memory = get_current_memory($conf->{memory});
     my $static_memory = 0;
     my $sockets = 1;
     $sockets = $conf->{smp} if $conf->{smp}; # old style - no longer iused
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser Alexandre Derumier
@ 2023-01-04  6:42 ` Alexandre Derumier
  2023-01-24 13:04   ` Fiona Ebner
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option Alexandre Derumier
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/QemuServer/Memory.pm | 43 +++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index 59e51c8..e9c0115 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -15,7 +15,6 @@ get_current_memory
 );
 
 my $MAX_NUMA = 8;
-my $STATICMEM = 1024;
 
 my $memory_fmt = {
     current => {
@@ -47,6 +46,26 @@ sub parse_memory {
     return $res;
 }
 
+my sub get_static_mem {
+    my ($conf, $defaults) = @_;
+
+    my $sockets = $conf->{sockets} || $defaults->{sockets};
+    my $hotplug_features = PVE::QemuServer::parse_hotplug_features(defined($conf->{hotplug}) ? $conf->{hotplug} : '1');
+
+    my $static_memory = 0;
+    my $memory = parse_memory($conf->{memory});
+
+    if ($hotplug_features->{memory}) {
+	#legacy
+	$static_memory = 1024;
+	$static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
+    } else {
+	$static_memory = $memory->{current};
+    }
+
+    return $static_memory;
+}
+
 my $_host_bits;
 my sub get_host_phys_address_bits {
     return $_host_bits if defined($_host_bits);
@@ -212,8 +231,7 @@ sub qemu_memory_hotplug {
     my $sockets = 1;
     $sockets = $conf->{sockets} if $conf->{sockets};
 
-    my $static_memory = $STATICMEM;
-    $static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
+    my $static_memory = get_static_mem($conf, $defaults);
 
     die "memory can't be lower than $static_memory MB" if $value < $static_memory;
     my $MAX_MEM = get_max_mem($conf);
@@ -321,7 +339,7 @@ sub config {
 
     my $memory = get_current_memory($conf->{memory});
 
-    my $static_memory = 0;
+    my $static_memory = get_static_mem($conf, $defaults);
 
     if ($hotplug_features->{memory}) {
 	die "NUMA needs to be enabled for memory hotplug\n" if !$conf->{numa};
@@ -333,18 +351,10 @@ sub config {
 		if $conf->{"numa$i"};
 	}
 
-	my $sockets = 1;
-	$sockets = $conf->{sockets} if $conf->{sockets};
-
-	$static_memory = $STATICMEM;
-	$static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
-
 	die "minimum memory must be ${static_memory}MB\n" if($memory < $static_memory);
 	push @$cmd, '-m', "size=${static_memory},slots=255,maxmem=${MAX_MEM}M";
 
     } else {
-
-	$static_memory = $memory;
 	push @$cmd, '-m', $static_memory;
     }
 
@@ -544,20 +554,13 @@ sub hugepages_topology {
 
     my $defaults = PVE::QemuServer::load_defaults();
     my $memory = get_current_memory($conf->{memory});
-    my $static_memory = 0;
+    my $static_memory = get_static_mem($conf, $defaults);
     my $sockets = 1;
     $sockets = $conf->{smp} if $conf->{smp}; # old style - no longer iused
     $sockets = $conf->{sockets} if $conf->{sockets};
     my $numa_custom_topology = undef;
     my $hotplug_features = PVE::QemuServer::parse_hotplug_features(defined($conf->{hotplug}) ? $conf->{hotplug} : '1');
 
-    if ($hotplug_features->{memory}) {
-	$static_memory = $STATICMEM;
-	$static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
-    } else {
-	$static_memory = $memory;
-    }
-
     #custom numa topology
     for (my $i = 0; $i < $MAX_NUMA; $i++) {
 	next if !$conf->{"numa$i"};
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (2 preceding siblings ...)
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem Alexandre Derumier
@ 2023-01-04  6:42 ` Alexandre Derumier
  2023-01-24 13:05   ` Fiona Ebner
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max Alexandre Derumier
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

max can be multiple of 64GB only,
The dimm size is compute from the max memory

we can have 64 slots:

64GB = 64 slots x 1GB
128GB = 64 slots x 2GB
..
4TB = 64 slots x 64GB

Also, with numa, we need to share slot between (up to 8) sockets.

64 is a multiple of 8,

64GB = 8 sockets * 8 slots * 1GB
128GB = 8 sockets * 8 slots * 2GB
...

and with virtio-mem,
we can have 32000 blocks of 2M minimum

64GB = 32000 * 2MB

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/QemuServer/Memory.pm | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index e9c0115..1c4f356 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -5,6 +5,7 @@ use warnings;
 
 use PVE::Tools qw(run_command lock_file lock_file_full file_read_firstline dir_glob_foreach);
 use PVE::Exception qw(raise raise_param_exc);
+use PVE::GuestHelpers qw(safe_string_ne safe_num_ne safe_boolean_ne);
 
 use PVE::QemuServer;
 use PVE::QemuServer::Monitor qw(mon_cmd);
@@ -25,7 +26,14 @@ my $memory_fmt = {
 	optional => 1,
 	minimum => 16,
 	default => 512,
-    }
+    },
+    max => {
+	type => 'integer',
+	optional => 1,
+	type => 'integer',
+	minimum => 65536,
+	maximum => 4194304
+    },
 };
 
 sub print_memory {
@@ -43,6 +51,9 @@ sub parse_memory {
 
     $res = eval { PVE::JSONSchema::parse_property_string($memory_fmt, $value) };
     die $@ if $@;
+
+    die "max memory need to be a multiple of 64GB" if $res->{max} && $res->{max} % 65536 != 0;
+
     return $res;
 }
 
@@ -223,6 +234,11 @@ sub qemu_memory_hotplug {
     my $oldmem = parse_memory($conf->{memory});
     my $newmem = parse_memory($value);
 
+    # skip non hotpluggable value
+    if (safe_num_ne($newmem->{max}, $oldmem->{max})) {
+	die "skip\n";
+    }
+
     my $memory = $oldmem->{current};
     $value = $newmem->{current};
 
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (3 preceding siblings ...)
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option Alexandre Derumier
@ 2023-01-04  6:42 ` Alexandre Derumier
  2023-01-24 13:05   ` Fiona Ebner
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:42 UTC (permalink / raw)
  To: pve-devel

verify than defined vm memorymax is not bigger than
host cpu supported memory

Add add early check in update vm api

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/API2/Qemu.pm         | 48 ++++++++++++++++++++++++++--------------
 PVE/QemuServer/Memory.pm | 23 +++++++++++++++++--
 2 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 4ffa973..cab1e84 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -32,7 +32,7 @@ use PVE::QemuServer::Drive;
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Machine;
-use PVE::QemuServer::Memory qw(get_current_memory);
+use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem);
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
 use PVE::AccessControl;
@@ -476,6 +476,35 @@ my $create_disks = sub {
     return ($vollist, $res);
 };
 
+my $check_memory_param = sub {
+    my ($conf, $param) = @_;
+
+    my $mem = parse_memory($param->{memory});
+    my $host_max_mem = get_host_max_mem($conf);
+
+    if ($mem->{max}) {
+	die "memory max can't be bigger than supported cpu architecture $host_max_mem MB\n"
+	    if $mem->{max} > $host_max_mem;
+    }
+
+    if ($param->{memory} || defined($param->{balloon})) {
+
+	my $maxmem = undef;
+	if ($param->{memory}) {
+	    $maxmem = get_current_memory($param->{memory});
+	} elsif ($conf->{pending}->{memory}) {
+	    $maxmem = get_current_memory($conf->{pending}->{memory});
+	} else {
+	    $maxmem = get_current_memory($conf->{memory});
+	}
+
+	my $balloon = defined($param->{balloon}) ? $param->{balloon} : $conf->{pending}->{balloon} || $conf->{balloon};
+
+	die "balloon value too large (must be smaller than assigned memory)\n"
+	    if $balloon && $balloon > $maxmem;
+    }
+};
+
 my $check_cpu_model_access = sub {
     my ($rpcenv, $authuser, $new, $existing) = @_;
 
@@ -1608,22 +1637,7 @@ my $update_vm_api  = sub {
 	    }
 	}
 
-	if ($param->{memory} || defined($param->{balloon})) {
-
-	    my $maxmem = undef;
-	    if ($param->{memory}) {
-	        $maxmem = get_current_memory($param->{memory});
-	    } elsif ($conf->{pending}->{memory}) {
-	        $maxmem = get_current_memory($conf->{pending}->{memory});
-	    } else {
-	        $maxmem = get_current_memory($conf->{memory});
-	    }
-
-	    my $balloon = defined($param->{balloon}) ? $param->{balloon} : $conf->{pending}->{balloon} || $conf->{balloon};
-
-	    die "balloon value too large (must be smaller than assigned memory)\n"
-		if $balloon && $balloon > $maxmem;
-	}
+	&$check_memory_param($conf, $param);
 
 	PVE::Cluster::log_msg('info', $authuser, "update VM $vmid: " . join (' ', @paramarr));
 
diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index 1c4f356..20b9bf9 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -13,6 +13,8 @@ use base qw(Exporter);
 
 our @EXPORT_OK = qw(
 get_current_memory
+parse_memory
+get_host_max_mem
 );
 
 my $MAX_NUMA = 8;
@@ -94,7 +96,7 @@ my sub get_host_phys_address_bits {
     return; # undef, cannot really do anything..
 }
 
-my sub get_max_mem {
+sub get_host_max_mem {
     my ($conf) = @_;
 
     my $cpu = {};
@@ -125,7 +127,24 @@ my sub get_max_mem {
     # heuristic: remove 20 bits to get MB and half that as QEMU needs some overhead
     my $bits_to_max_mem = int(1<<($bits - 21));
 
-    return $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
+    my $host_max_mem = $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
+
+    return $host_max_mem;
+}
+
+my sub get_max_mem {
+    my ($conf) = @_;
+
+    my $host_max_mem = get_host_max_mem($conf);
+    my $mem = parse_memory($conf->{memory});
+
+    if ($mem->{max}) {
+	die "configured memory max can't be bigger than supported cpu architecture $host_max_mem MB\n"
+	    if $mem->{max} > $host_max_mem;
+	return $mem->{max};
+    }
+
+    return $host_max_mem;
 }
 
 sub get_current_memory {
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (4 preceding siblings ...)
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max Alexandre Derumier
@ 2023-01-04  6:43 ` Alexandre Derumier
  2023-01-24 13:06   ` Fiona Ebner
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests Alexandre Derumier
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:43 UTC (permalink / raw)
  To: pve-devel

default kernel vhost config only support 64 slots by default,
for performance since 2015.

Original memory hotplug code was done before, using qemu
max supported 255 slots.

To reach max mem (4TB), we used incremental dimm size.

Instead of dynamic memory size, use 1 static dimm size, compute
from max memory/64.

We also force a static memory to 4GB, as memory under 4G don't
works fine with unplug

Fix:
https://bugzilla.proxmox.com/show_bug.cgi?id=3446
https://bugzilla.proxmox.com/show_bug.cgi?id=1426
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/QemuServer/Memory.pm | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index 20b9bf9..b9136d2 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -18,6 +18,7 @@ get_host_max_mem
 );
 
 my $MAX_NUMA = 8;
+my $MAX_SLOTS = 64;
 
 my $memory_fmt = {
     current => {
@@ -68,7 +69,12 @@ my sub get_static_mem {
     my $static_memory = 0;
     my $memory = parse_memory($conf->{memory});
 
-    if ($hotplug_features->{memory}) {
+    if ($memory->{max}) {
+	my $dimm_size = $memory->{max} / $MAX_SLOTS;
+	#static mem can't be lower than 4G and lower than 1 dimmsize by socket
+	$static_memory = $dimm_size * $sockets;
+	$static_memory = 4096 if $static_memory < 4096;
+    } elsif ($hotplug_features->{memory}) {
 	#legacy
 	$static_memory = 1024;
 	$static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
@@ -185,14 +191,15 @@ sub foreach_dimm{
     my ($conf, $vmid, $memory, $sockets, $func) = @_;
 
     my $dimm_id = 0;
-    my $current_size = 0;
+    my $current_size = get_static_mem($conf);
     my $dimm_size = 0;
 
-    if($conf->{hugepages} && $conf->{hugepages} == 1024) {
-	$current_size = 1024 * $sockets;
+    my $confmem = parse_memory($conf->{memory});
+    if ($confmem->{max}) {
+	$dimm_size = $confmem->{max} / $MAX_SLOTS;
+    } elsif($conf->{hugepages} && $conf->{hugepages} == 1024) {
 	$dimm_size = 1024;
     } else {
-	$current_size = 1024;
 	$dimm_size = 512;
     }
 
@@ -209,7 +216,7 @@ sub foreach_dimm{
 	    &$func($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory);
 	    return  $current_size if $current_size >= $memory;
 	}
-	$dimm_size *= 2;
+	$dimm_size *= 2 if !$confmem->{max};
     }
 }
 
@@ -220,7 +227,12 @@ sub foreach_reverse_dimm {
     my $current_size = 0;
     my $dimm_size = 0;
 
-    if($conf->{hugepages} && $conf->{hugepages} == 1024) {
+    my $confmem = parse_memory($conf->{memory});
+    if ($confmem->{max}) {
+	$dimm_id = $MAX_SLOTS - 1;
+	$current_size = $confmem->{max};
+	$dimm_size = $confmem->{max} / $MAX_SLOTS;
+    } elsif ($conf->{hugepages} && $conf->{hugepages} == 1024) {
 	$current_size = 8355840;
 	$dimm_size = 131072;
     } else {
@@ -241,7 +253,7 @@ sub foreach_reverse_dimm {
 	    &$func($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory);
 	    return  $current_size if $current_size <= $memory;
 	}
-	$dimm_size /= 2;
+	$dimm_size /= 2 if !$confmem->{max};
     }
 }
 
@@ -387,7 +399,9 @@ sub config {
 	}
 
 	die "minimum memory must be ${static_memory}MB\n" if($memory < $static_memory);
-	push @$cmd, '-m', "size=${static_memory},slots=255,maxmem=${MAX_MEM}M";
+	my $confmem = parse_memory($conf->{memory});
+	my $slots = $confmem->{max} ? $MAX_SLOTS : 255;
+	push @$cmd, '-m', "size=${static_memory},slots=$slots,maxmem=${MAX_MEM}M";
 
     } else {
 	push @$cmd, '-m', $static_memory;
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (5 preceding siblings ...)
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
@ 2023-01-04  6:43 ` Alexandre Derumier
  2023-01-24 13:06   ` Fiona Ebner
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support Alexandre Derumier
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:43 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 test/cfg2cmd/memory-max-128G.conf     | 11 ++++
 test/cfg2cmd/memory-max-128G.conf.cmd | 86 +++++++++++++++++++++++++++
 test/cfg2cmd/memory-max-512G.conf     | 11 ++++
 test/cfg2cmd/memory-max-512G.conf.cmd | 58 ++++++++++++++++++
 4 files changed, 166 insertions(+)
 create mode 100644 test/cfg2cmd/memory-max-128G.conf
 create mode 100644 test/cfg2cmd/memory-max-128G.conf.cmd
 create mode 100644 test/cfg2cmd/memory-max-512G.conf
 create mode 100644 test/cfg2cmd/memory-max-512G.conf.cmd

diff --git a/test/cfg2cmd/memory-max-128G.conf b/test/cfg2cmd/memory-max-128G.conf
new file mode 100644
index 0000000..8f0328d
--- /dev/null
+++ b/test/cfg2cmd/memory-max-128G.conf
@@ -0,0 +1,11 @@
+# TEST: memorymax 64G with 2 socket, dimm size should be 1GB && static memory should be 4G
+# QEMU_VERSION: 7.2
+cores: 2
+memory: 32768,max=65536
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-max-128G.conf.cmd b/test/cfg2cmd/memory-max-128G.conf.cmd
new file mode 100644
index 0000000..4e2c06d
--- /dev/null
+++ b/test/cfg2cmd/memory-max-128G.conf.cmd
@@ -0,0 +1,86 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=4096,slots=64,maxmem=65536M' \
+  -object 'memory-backend-ram,id=ram-node0,size=2048M' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-ram,id=ram-node1,size=2048M' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-ram,id=mem-dimm0,size=1024M' \
+  -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm1,size=1024M' \
+  -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm2,size=1024M' \
+  -device 'pc-dimm,id=dimm2,memdev=mem-dimm2,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm3,size=1024M' \
+  -device 'pc-dimm,id=dimm3,memdev=mem-dimm3,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm4,size=1024M' \
+  -device 'pc-dimm,id=dimm4,memdev=mem-dimm4,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm5,size=1024M' \
+  -device 'pc-dimm,id=dimm5,memdev=mem-dimm5,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm6,size=1024M' \
+  -device 'pc-dimm,id=dimm6,memdev=mem-dimm6,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm7,size=1024M' \
+  -device 'pc-dimm,id=dimm7,memdev=mem-dimm7,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm8,size=1024M' \
+  -device 'pc-dimm,id=dimm8,memdev=mem-dimm8,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm9,size=1024M' \
+  -device 'pc-dimm,id=dimm9,memdev=mem-dimm9,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm10,size=1024M' \
+  -device 'pc-dimm,id=dimm10,memdev=mem-dimm10,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm11,size=1024M' \
+  -device 'pc-dimm,id=dimm11,memdev=mem-dimm11,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm12,size=1024M' \
+  -device 'pc-dimm,id=dimm12,memdev=mem-dimm12,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm13,size=1024M' \
+  -device 'pc-dimm,id=dimm13,memdev=mem-dimm13,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm14,size=1024M' \
+  -device 'pc-dimm,id=dimm14,memdev=mem-dimm14,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm15,size=1024M' \
+  -device 'pc-dimm,id=dimm15,memdev=mem-dimm15,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm16,size=1024M' \
+  -device 'pc-dimm,id=dimm16,memdev=mem-dimm16,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm17,size=1024M' \
+  -device 'pc-dimm,id=dimm17,memdev=mem-dimm17,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm18,size=1024M' \
+  -device 'pc-dimm,id=dimm18,memdev=mem-dimm18,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm19,size=1024M' \
+  -device 'pc-dimm,id=dimm19,memdev=mem-dimm19,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm20,size=1024M' \
+  -device 'pc-dimm,id=dimm20,memdev=mem-dimm20,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm21,size=1024M' \
+  -device 'pc-dimm,id=dimm21,memdev=mem-dimm21,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm22,size=1024M' \
+  -device 'pc-dimm,id=dimm22,memdev=mem-dimm22,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm23,size=1024M' \
+  -device 'pc-dimm,id=dimm23,memdev=mem-dimm23,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm24,size=1024M' \
+  -device 'pc-dimm,id=dimm24,memdev=mem-dimm24,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm25,size=1024M' \
+  -device 'pc-dimm,id=dimm25,memdev=mem-dimm25,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm26,size=1024M' \
+  -device 'pc-dimm,id=dimm26,memdev=mem-dimm26,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm27,size=1024M' \
+  -device 'pc-dimm,id=dimm27,memdev=mem-dimm27,node=1' \
+   -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc+pve0'
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-max-512G.conf b/test/cfg2cmd/memory-max-512G.conf
new file mode 100644
index 0000000..082f75e
--- /dev/null
+++ b/test/cfg2cmd/memory-max-512G.conf
@@ -0,0 +1,11 @@
+# TEST: memorymax 512GB with 2 socket, dimm size should be 8GB && static memory should be 16G
+# QEMU_VERSION: 7.2
+cores: 2
+memory: 131072,max=524288
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-max-512G.conf.cmd b/test/cfg2cmd/memory-max-512G.conf.cmd
new file mode 100644
index 0000000..3c04b70
--- /dev/null
+++ b/test/cfg2cmd/memory-max-512G.conf.cmd
@@ -0,0 +1,58 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=16384,slots=64,maxmem=524288M' \
+  -object 'memory-backend-ram,id=ram-node0,size=8192M' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-ram,id=ram-node1,size=8192M' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-ram,id=mem-dimm0,size=8192M' \
+  -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm1,size=8192M' \
+  -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm2,size=8192M' \
+  -device 'pc-dimm,id=dimm2,memdev=mem-dimm2,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm3,size=8192M' \
+  -device 'pc-dimm,id=dimm3,memdev=mem-dimm3,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm4,size=8192M' \
+  -device 'pc-dimm,id=dimm4,memdev=mem-dimm4,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm5,size=8192M' \
+  -device 'pc-dimm,id=dimm5,memdev=mem-dimm5,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm6,size=8192M' \
+  -device 'pc-dimm,id=dimm6,memdev=mem-dimm6,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm7,size=8192M' \
+  -device 'pc-dimm,id=dimm7,memdev=mem-dimm7,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm8,size=8192M' \
+  -device 'pc-dimm,id=dimm8,memdev=mem-dimm8,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm9,size=8192M' \
+  -device 'pc-dimm,id=dimm9,memdev=mem-dimm9,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm10,size=8192M' \
+  -device 'pc-dimm,id=dimm10,memdev=mem-dimm10,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm11,size=8192M' \
+  -device 'pc-dimm,id=dimm11,memdev=mem-dimm11,node=1' \
+  -object 'memory-backend-ram,id=mem-dimm12,size=8192M' \
+  -device 'pc-dimm,id=dimm12,memdev=mem-dimm12,node=0' \
+  -object 'memory-backend-ram,id=mem-dimm13,size=8192M' \
+  -device 'pc-dimm,id=dimm13,memdev=mem-dimm13,node=1' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc+pve0'
\ No newline at end of file
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (6 preceding siblings ...)
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests Alexandre Derumier
@ 2023-01-04  6:43 ` Alexandre Derumier
  2023-01-24 13:06   ` Fiona Ebner
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests Alexandre Derumier
  2023-01-24 13:08 ` [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Fiona Ebner
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:43 UTC (permalink / raw)
  To: pve-devel

a 4GB static memory is needed for DMA+boot memory, as this memory
is almost always un-unpluggeable.

1 virtio-mem pci device is setup for each numa node on pci.4 bridge

virtio-mem use a fixed blocksize with 32000 blocks
Blocksize is computed from the maxmemory-4096/32000 with a minimum of
2MB to map THP.
(lower blocksize = more chance to unplug memory).

Note: Currently, linux only support 4MB virtio blocksize, 2MB support
is currently is progress.

For hotplug/unplug, we are try to allocate/unallocate same amount
of memory aligned to the blocksize on each numa node if possible.
If a node a not able to reach the target memory (could be an unmovable
page on unplug for example), we try again to redispatch memory the
remaining memory on others nodes.

About hugepages:

For ordinary memory devices, such as DIMMs, we preallocate memory via the
memory backend for such use cases; however, with virtio-mem we're dealing
with sparse memory backends; preallocating the whole memory backend
destroys the whole purpose of virtio-mem.

Instead, we want to preallocate memory when actually exposing memory to the
VM dynamically, and fail plugging memory gracefully + warn the user in case
preallocation fails.

fixes:
https://bugzilla.proxmox.com/show_bug.cgi?id=931
https://bugzilla.proxmox.com/show_bug.cgi?id=2949
---

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 PVE/API2/Qemu.pm         |  10 +-
 PVE/QemuServer.pm        |   7 +-
 PVE/QemuServer/Memory.pm | 233 ++++++++++++++++++++++++++++++++++++---
 PVE/QemuServer/PCI.pm    |   8 ++
 4 files changed, 242 insertions(+), 16 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index cab1e84..42941ac 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -32,7 +32,7 @@ use PVE::QemuServer::Drive;
 use PVE::QemuServer::ImportDisk;
 use PVE::QemuServer::Monitor qw(mon_cmd);
 use PVE::QemuServer::Machine;
-use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem);
+use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem get_virtiomem_block_size);
 use PVE::QemuMigrate;
 use PVE::RPCEnvironment;
 use PVE::AccessControl;
@@ -487,6 +487,14 @@ my $check_memory_param = sub {
 	    if $mem->{max} > $host_max_mem;
     }
 
+    #unplug works better with 128MB by dimm to match the linux blocksize btyes.
+    if ($mem->{virtio}) {
+	my $blocksize = get_virtiomem_block_size($conf);
+
+	die "memory need to be a multiple of $blocksize MB when virtiomem is enabled\n"
+	    if $mem->{current} % $blocksize != 0;
+    }
+
     if ($param->{memory} || defined($param->{balloon})) {
 
 	my $maxmem = undef;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 5847a78..51b29fc 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -3857,7 +3857,12 @@ sub config_to_command {
 	push @$cmd, get_cpu_options($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough);
     }
 
-    PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd);
+    my $mem_devices = {};
+    PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $mem_devices);
+    foreach my $id (sort keys %$mem_devices) {
+	my $pciaddr = print_pci_addr($id, $bridges, $arch, $machine_type);
+	push @$devices, "-device", "$mem_devices->{$id}$pciaddr";
+    }
 
     push @$cmd, '-S' if $conf->{freeze};
 
diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index b9136d2..6827004 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -3,6 +3,8 @@ package PVE::QemuServer::Memory;
 use strict;
 use warnings;
 
+use POSIX qw/ceil/;
+
 use PVE::Tools qw(run_command lock_file lock_file_full file_read_firstline dir_glob_foreach);
 use PVE::Exception qw(raise raise_param_exc);
 use PVE::GuestHelpers qw(safe_string_ne safe_num_ne safe_boolean_ne);
@@ -15,6 +17,7 @@ our @EXPORT_OK = qw(
 get_current_memory
 parse_memory
 get_host_max_mem
+get_virtiomem_block_size
 );
 
 my $MAX_NUMA = 8;
@@ -37,6 +40,12 @@ my $memory_fmt = {
 	minimum => 65536,
 	maximum => 4194304
     },
+    virtio => {
+	description => "enable virtio-mem memory",
+	type => 'boolean',
+	optional => 1,
+	default => 0,
+    }
 };
 
 sub print_memory {
@@ -69,7 +78,9 @@ my sub get_static_mem {
     my $static_memory = 0;
     my $memory = parse_memory($conf->{memory});
 
-    if ($memory->{max}) {
+    if ($memory->{virtio}) {
+	$static_memory = 4096;
+    } elsif ($memory->{max}) {
 	my $dimm_size = $memory->{max} / $MAX_SLOTS;
 	#static mem can't be lower than 4G and lower than 1 dimmsize by socket
 	$static_memory = $dimm_size * $sockets;
@@ -160,6 +171,134 @@ sub get_current_memory {
     return $memory->{current};
 }
 
+sub get_virtiomem_block_size {
+    my ($conf) = @_;
+
+    my $MAX_MEM = get_max_mem($conf);
+    my $static_memory = get_static_mem($conf);
+    my $memory = get_current_memory($conf->{memory});
+
+    #virtiomem can map 32000 block size.
+    #try to use lowest blocksize, lower = more chance to unplug memory.
+    my $blocksize = ($MAX_MEM - $static_memory) / 32000;
+    #2MB is the minimum to be aligned with THP
+    $blocksize = 2**(ceil(log($blocksize)/log(2)));
+    $blocksize = 4 if $blocksize < 4;
+
+    return $blocksize;
+}
+
+my sub get_virtiomem_total_current {
+    my ($mems) = @_;
+    my $total = 0;
+    foreach my $id (keys %$mems) {
+	my $mem = $mems->{$id};
+	$total += $mem->{current};
+    }
+    return $total;
+}
+
+my sub get_virtiomem_total_noerror {
+    my ($mems) = @_;
+
+    my $total = 0;
+    foreach my $id (keys %$mems) {
+	my $mem = $mems->{$id};
+	next if $mem->{error};
+	$total++;
+    }
+    return $total;
+}
+
+my sub get_virtiomem_total_errors_size {
+    my ($mems) = @_;
+
+    my $size = 0;
+    foreach my $id (keys %$mems) {
+	my $mem = $mems->{$id};
+	next if !$mem->{error};
+	$size += $mem->{current};
+    }
+    return $size;
+}
+
+my sub balance_virtiomem {
+    my ($vmid, $virtiomems, $blocksize, $target_virtiomem_total) = @_;
+
+    my $virtiomem_total_noerror = get_virtiomem_total_noerror($virtiomems);
+
+    print"try to balance memory on $virtiomem_total_noerror remaining virtiomems\n";
+
+    die "error. no more available blocks in virtiomem to balance the remaining memory" if $target_virtiomem_total < 0;
+    die "error. No more available virtiomem to balance the remaining memory\n" if $virtiomem_total_noerror == 0;
+
+    my $virtiomem_target_aligned = int( $target_virtiomem_total / $virtiomem_total_noerror / $blocksize) * $blocksize;
+    my $virtiomem_target_remaining = $target_virtiomem_total - ($virtiomem_target_aligned * ($virtiomem_total_noerror-1));
+
+    my $i = 0;
+    foreach my $id (sort keys %$virtiomems) {
+	my $virtiomem = $virtiomems->{$id};
+	next if $virtiomem->{error};
+	$i++;
+	my $virtiomem_target = $i == $virtiomem_total_noerror ? $virtiomem_target_remaining : $virtiomem_target_aligned;
+	$virtiomem->{completed} = 0;
+	$virtiomem->{retry} = 0;
+	$virtiomem->{target} = $virtiomem_target;
+
+	print "virtiomem$id: set-requested-size : $virtiomem_target\n";
+	mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem_target * 1024 * 1024);
+    }
+
+    while (1) {
+
+	sleep 1;
+	my $total_finished = 0;
+
+	foreach my $id (keys %$virtiomems) {
+
+	    my $virtiomem = $virtiomems->{$id};
+
+	    if ($virtiomem->{error} || $virtiomem->{completed}) {
+		$total_finished++;
+		next;
+	    }
+
+	    my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$id", property => "size");
+	    $virtiomem->{current} = $size / 1024 / 1024;
+	    print"virtiomem$id: virtiomem->last: $virtiomem->{last} virtiomem->current: $virtiomem->{current} virtio_mem_target:$virtiomem->{target}\n";
+
+	    if($virtiomem->{current} == $virtiomem->{target}) {
+		print"virtiomem$id: completed\n";
+		$virtiomem->{completed} = 1;
+		next;
+	    }
+
+	    if($virtiomem->{current} != $virtiomem->{last}) {
+		#if value has changed, but not yet completed
+		print "virtiomem$id: changed but don't not reach target yet\n";
+		$virtiomem->{retry} = 0;
+		$virtiomem->{last} = $virtiomem->{current};
+		next;
+	    }
+
+	    if($virtiomem->{retry} >= 5) {
+		print "virtiomem$id: too many retry. set error\n";
+		$virtiomem->{error} = 1;
+		#as change is async, we don't want that value change after the api call
+		eval {
+		    mon_cmd($vmid, 'qom-set', path => "/machine/peripheral/virtiomem$id", property => "requested-size", value => $virtiomem->{current} * 1024 *1024);
+		};
+	    }
+	    print"virtiomem$id: increase retry: $virtiomem->{retry}\n";
+	    $virtiomem->{retry}++;
+	}
+
+	my $nb_virtiomem = keys %$virtiomems;
+	print"total finished: $total_finished   numberof virtiomem:$nb_virtiomem \n";
+	return if $total_finished == $nb_virtiomem;
+    }
+}
+
 sub get_numa_node_list {
     my ($conf) = @_;
     my @numa_map;
@@ -266,7 +405,8 @@ sub qemu_memory_hotplug {
     my $newmem = parse_memory($value);
 
     # skip non hotpluggable value
-    if (safe_num_ne($newmem->{max}, $oldmem->{max})) {
+    if (safe_num_ne($newmem->{max}, $oldmem->{max}) ||
+	safe_boolean_ne($newmem->{virtio}, $oldmem->{virtio})) {
 	die "skip\n";
     }
 
@@ -284,7 +424,43 @@ sub qemu_memory_hotplug {
     my $MAX_MEM = get_max_mem($conf);
     die "you cannot add more memory than max mem $MAX_MEM MB!\n" if $memory > $MAX_MEM;
 
-    if ($value > $memory) {
+    my $confmem = parse_memory($conf->{memory});
+
+    if ($confmem->{virtio}) {
+	my $blocksize = get_virtiomem_block_size($conf);
+
+	my $virtiomems = {};
+	for (my $i = 0; $i < $sockets; $i++) {
+	    my $size = mon_cmd($vmid, 'qom-get', path => "/machine/peripheral/virtiomem$i", property => "size");
+	    $size = $size / 1024 /1024;
+	    $virtiomems->{$i} = {
+		current => $size,
+		last => $size,
+		error => 0,
+		completed => 0,
+		retry => 0
+	    };
+	}
+
+	while (1) {
+
+	    my $target_virtiomem_total = $value - $static_memory - get_virtiomem_total_errors_size($virtiomems);
+	    my $err;
+	    eval {
+	        balance_virtiomem($vmid, $virtiomems, $blocksize, $target_virtiomem_total);
+	    };
+	    $err = $@ if $@;
+
+	    my $current_memory = $static_memory + get_virtiomem_total_current($virtiomems);
+	    $newmem->{current} = $current_memory;
+	    $conf->{memory} = print_memory($newmem);
+	    PVE::QemuConfig->write_config($vmid, $conf);
+
+	    die $err if $err;
+	    last if $current_memory == $value;
+	}
+
+   } elsif ($value > $memory) {
 
 	my $numa_hostmap;
 
@@ -382,7 +558,7 @@ sub qemu_dimm_list {
 }
 
 sub config {
-    my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd) = @_;
+    my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $mem_devices) = @_;
 
     my $memory = get_current_memory($conf->{memory});
 
@@ -401,7 +577,10 @@ sub config {
 	die "minimum memory must be ${static_memory}MB\n" if($memory < $static_memory);
 	my $confmem = parse_memory($conf->{memory});
 	my $slots = $confmem->{max} ? $MAX_SLOTS : 255;
-	push @$cmd, '-m', "size=${static_memory},slots=$slots,maxmem=${MAX_MEM}M";
+	my $cmdstr = "size=${static_memory}";
+	$cmdstr .= ",slots=$slots" if !$confmem->{'virtio'};
+	$cmdstr .= ",maxmem=${MAX_MEM}M";
+	push @$cmd, '-m', $cmdstr;
 
     } else {
 	push @$cmd, '-m', $static_memory;
@@ -471,29 +650,55 @@ sub config {
     }
 
     if ($hotplug_features->{memory}) {
-	foreach_dimm($conf, $vmid, $memory, $sockets, sub {
-	    my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
 
-	    my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size);
+	my $confmem = parse_memory($conf->{memory});
 
-	    push @$cmd, "-object" , $mem_object;
-	    push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode";
+	if ($confmem->{'virtio'}) {
+	    my $MAX_MEM = get_max_mem($conf);
+	    my $node_maxmem = ($MAX_MEM - $static_memory) / $sockets;
+	    my $node_mem = ($memory - $static_memory) / $sockets;
+	    my $blocksize = get_virtiomem_block_size($conf);
 
-	    die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n"
-		if $current_size > $memory;
-	});
+	    for (my $i = 0; $i < $sockets; $i++)  {
+
+		my $id = "virtiomem$i";
+		my $mem_object = print_mem_object($conf, "mem-$id", $node_maxmem);
+		push @$cmd, "-object" , "$mem_object,reserve=off";
+
+		my $mem_device = "virtio-mem-pci,block-size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-$id,node=$i";
+		$mem_device .= ",prealloc=on" if $conf->{hugepages};
+		$mem_devices->{$id} = $mem_device;
+	    }
+	} else {
+
+	    foreach_dimm($conf, $vmid, $memory, $sockets, sub {
+		my ($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory) = @_;
+
+		my $mem_object = print_mem_object($conf, "mem-$name", $dimm_size);
+
+		push @$cmd, "-object" , $mem_object;
+		push @$cmd, "-device", "pc-dimm,id=$name,memdev=mem-$name,node=$numanode";
+
+		die "memory size ($memory) must be aligned to $dimm_size for hotplugging\n"
+		    if $current_size > $memory;
+	    });
+	}
     }
 }
 
 sub print_mem_object {
     my ($conf, $id, $size) = @_;
 
+    my $confmem = parse_memory($conf->{memory});
+
     if ($conf->{hugepages}) {
 
 	my $hugepages_size = hugepages_size($conf, $size);
 	my $path = hugepages_mount_path($hugepages_size);
 
-	return "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on,prealloc=yes";
+	my $object = "memory-backend-file,id=$id,size=${size}M,mem-path=$path,share=on";
+	$object .= ",prealloc=yes" if !$confmem->{virtio};
+	return $object;
     } else {
 	return "memory-backend-ram,id=$id,size=${size}M";
     }
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index a18b974..0187c74 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -249,6 +249,14 @@ sub get_pci_addr_map {
 	'scsihw2' => { bus => 4, addr => 1 },
 	'scsihw3' => { bus => 4, addr => 2 },
 	'scsihw4' => { bus => 4, addr => 3 },
+	'virtiomem0' => { bus => 4, addr => 4 },
+	'virtiomem1' => { bus => 4, addr => 5 },
+	'virtiomem2' => { bus => 4, addr => 6 },
+	'virtiomem3' => { bus => 4, addr => 7 },
+	'virtiomem4' => { bus => 4, addr => 8 },
+	'virtiomem5' => { bus => 4, addr => 9 },
+	'virtiomem6' => { bus => 4, addr => 10 },
+	'virtiomem7' => { bus => 4, addr => 11 },
     } if !defined($pci_addr_map);
     return $pci_addr_map;
 }
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (7 preceding siblings ...)
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support Alexandre Derumier
@ 2023-01-04  6:43 ` Alexandre Derumier
  2023-01-24 13:08   ` Fiona Ebner
  2023-01-24 13:08 ` [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Fiona Ebner
  9 siblings, 1 reply; 30+ messages in thread
From: Alexandre Derumier @ 2023-01-04  6:43 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 test/cfg2cmd/memory-virtio-hugepages-1G.conf  | 12 +++++++
 .../memory-virtio-hugepages-1G.conf.cmd       | 35 +++++++++++++++++++
 test/cfg2cmd/memory-virtio-max.conf           | 11 ++++++
 test/cfg2cmd/memory-virtio-max.conf.cmd       | 35 +++++++++++++++++++
 test/cfg2cmd/memory-virtio.conf               | 11 ++++++
 test/cfg2cmd/memory-virtio.conf.cmd           | 35 +++++++++++++++++++
 6 files changed, 139 insertions(+)
 create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf
 create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd
 create mode 100644 test/cfg2cmd/memory-virtio-max.conf
 create mode 100644 test/cfg2cmd/memory-virtio-max.conf.cmd
 create mode 100644 test/cfg2cmd/memory-virtio.conf
 create mode 100644 test/cfg2cmd/memory-virtio.conf.cmd

diff --git a/test/cfg2cmd/memory-virtio-hugepages-1G.conf b/test/cfg2cmd/memory-virtio-hugepages-1G.conf
new file mode 100644
index 0000000..f28c7f8
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio-hugepages-1G.conf
@@ -0,0 +1,12 @@
+# TEST: virtio-mem with memorymax defined
+# QEMU_VERSION: 7.2
+cores: 2
+memory: 32768,virtio=1
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hugepages: 1024
+hotplug: memory
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd b/test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd
new file mode 100644
index 0000000..2e7de74
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd
@@ -0,0 +1,35 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=4096,maxmem=524288M' \
+  -object 'memory-backend-file,id=ram-node0,size=2048M,mem-path=/run/hugepages/kvm/1048576kB,share=on' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-file,id=ram-node1,size=2048M,mem-path=/run/hugepages/kvm/1048576kB,share=on' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-file,id=mem-virtiomem0,size=260096M,mem-path=/run/hugepages/kvm/1048576kB,share=on,reserve=off' \
+  -object 'memory-backend-file,id=mem-virtiomem1,size=260096M,mem-path=/run/hugepages/kvm/1048576kB,share=on,reserve=off' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'pci-bridge,id=pci.4,chassis_nr=4,bus=pci.1,addr=0x1c' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-mem-pci,block-size=32M,requested-size=14336M,id=virtiomem0,memdev=mem-virtiomem0,node=0,prealloc=on,bus=pci.4,addr=0x4' \
+  -device 'virtio-mem-pci,block-size=32M,requested-size=14336M,id=virtiomem1,memdev=mem-virtiomem1,node=1,prealloc=on,bus=pci.4,addr=0x5' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc+pve0'
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-virtio-max.conf b/test/cfg2cmd/memory-virtio-max.conf
new file mode 100644
index 0000000..59b8e94
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio-max.conf
@@ -0,0 +1,11 @@
+# TEST: memory max 64G with 2 socket with virtio-mem
+# QEMU_VERSION: 7.2
+cores: 2
+memory: 32768,max=65536,virtio=1
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-virtio-max.conf.cmd b/test/cfg2cmd/memory-virtio-max.conf.cmd
new file mode 100644
index 0000000..6bda268
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio-max.conf.cmd
@@ -0,0 +1,35 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=4096,maxmem=65536M' \
+  -object 'memory-backend-ram,id=ram-node0,size=2048M' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-ram,id=ram-node1,size=2048M' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-ram,id=mem-virtiomem0,size=30720M,reserve=off' \
+  -object 'memory-backend-ram,id=mem-virtiomem1,size=30720M,reserve=off' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'pci-bridge,id=pci.4,chassis_nr=4,bus=pci.1,addr=0x1c' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-mem-pci,block-size=4M,requested-size=14336M,id=virtiomem0,memdev=mem-virtiomem0,node=0,bus=pci.4,addr=0x4' \
+  -device 'virtio-mem-pci,block-size=4M,requested-size=14336M,id=virtiomem1,memdev=mem-virtiomem1,node=1,bus=pci.4,addr=0x5' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc+pve0'
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-virtio.conf b/test/cfg2cmd/memory-virtio.conf
new file mode 100644
index 0000000..60c0264
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio.conf
@@ -0,0 +1,11 @@
+# TEST: virtio-mem with memorymax defined
+# QEMU_VERSION: 7.2
+cores: 2
+memory: 32768,virtio=1
+name: simple
+numa: 1
+ostype: l26
+smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
+sockets: 2
+vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
+hotplug: memory
\ No newline at end of file
diff --git a/test/cfg2cmd/memory-virtio.conf.cmd b/test/cfg2cmd/memory-virtio.conf.cmd
new file mode 100644
index 0000000..f392636
--- /dev/null
+++ b/test/cfg2cmd/memory-virtio.conf.cmd
@@ -0,0 +1,35 @@
+/usr/bin/kvm \
+  -id 8006 \
+  -name 'simple,debug-threads=on' \
+  -no-shutdown \
+  -chardev 'socket,id=qmp,path=/var/run/qemu-server/8006.qmp,server=on,wait=off' \
+  -mon 'chardev=qmp,mode=control' \
+  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
+  -mon 'chardev=qmp-event,mode=control' \
+  -pidfile /var/run/qemu-server/8006.pid \
+  -daemonize \
+  -smbios 'type=1,uuid=7b10d7af-b932-4c66-b2c3-3996152ec465' \
+  -smp '4,sockets=2,cores=2,maxcpus=4' \
+  -nodefaults \
+  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
+  -vnc 'unix:/var/run/qemu-server/8006.vnc,password=on' \
+  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
+  -m 'size=4096,maxmem=524288M' \
+  -object 'memory-backend-ram,id=ram-node0,size=2048M' \
+  -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' \
+  -object 'memory-backend-ram,id=ram-node1,size=2048M' \
+  -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' \
+  -object 'memory-backend-ram,id=mem-virtiomem0,size=260096M,reserve=off' \
+  -object 'memory-backend-ram,id=mem-virtiomem1,size=260096M,reserve=off' \
+  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
+  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
+  -device 'pci-bridge,id=pci.4,chassis_nr=4,bus=pci.1,addr=0x1c' \
+  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
+  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
+  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
+  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
+  -device 'virtio-mem-pci,block-size=32M,requested-size=14336M,id=virtiomem0,memdev=mem-virtiomem0,node=0,bus=pci.4,addr=0x4' \
+  -device 'virtio-mem-pci,block-size=32M,requested-size=14336M,id=virtiomem1,memdev=mem-virtiomem1,node=1,bus=pci.4,addr=0x5' \
+  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
+  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
+  -machine 'type=pc+pve0'
\ No newline at end of file
-- 
2.30.2




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
@ 2023-01-24 13:04   ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:04 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/QemuServer/Memory.pm                      |  11 +-
>  test/cfg2cmd/memory-hotplug-hugepages.conf    |  12 ++
>  .../cfg2cmd/memory-hotplug-hugepages.conf.cmd |  62 +++++++
>  test/cfg2cmd/memory-hotplug.conf              |  11 ++
>  test/cfg2cmd/memory-hotplug.conf.cmd          | 174 ++++++++++++++++++
>  test/cfg2cmd/memory-hugepages-1g.conf         |  11 ++
>  test/cfg2cmd/memory-hugepages-1g.conf.cmd     |  30 +++
>  test/cfg2cmd/memory-hugepages-2m.conf         |  11 ++
>  test/cfg2cmd/memory-hugepages-2m.conf.cmd     |  30 +++
>  test/run_config2command_tests.pl              |  21 +++
>  10 files changed, 371 insertions(+), 2 deletions(-)
>  create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf
>  create mode 100644 test/cfg2cmd/memory-hotplug-hugepages.conf.cmd
>  create mode 100644 test/cfg2cmd/memory-hotplug.conf
>  create mode 100644 test/cfg2cmd/memory-hotplug.conf.cmd
>  create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf
>  create mode 100644 test/cfg2cmd/memory-hugepages-1g.conf.cmd
>  create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf
>  create mode 100644 test/cfg2cmd/memory-hugepages-2m.conf.cmd
> 
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index f8fc534..6c1cd94 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -348,7 +348,7 @@ sub config {
>  	    my $numa_memory = ($static_memory / $sockets);
>  
>  	    for (my $i = 0; $i < $sockets; $i++)  {
> -		die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/" && $conf->{hugepages};
> +		die "host NUMA node$i doesn't exist\n" if !host_numanode_exist($i) && $conf->{hugepages};
>  
>  		my $mem_object = print_mem_object($conf, "ram-node$i", $numa_memory);
>  		push @$cmd, '-object', $mem_object;
> @@ -391,6 +391,13 @@ sub print_mem_object {
>  
>  }
>  
> +sub host_numanode_exist {
> +    my ($id) = @_;
> +
> +    return if ! -d "/sys/devices/system/node/node$id/";
> +    return 1;

Style nit: could be one line
return -d "/sys/devices/system/node/node$id/";

> +}
> +
>  sub print_numa_hostnodes {
>      my ($hostnodelists) = @_;
>  
> @@ -402,7 +409,7 @@ sub print_numa_hostnodes {
>  	$hostnodes .= "-$end" if defined($end);
>  	$end //= $start;
>  	for (my $i = $start; $i <= $end; ++$i ) {
> -	    die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/";
> +	    die "host NUMA node$i doesn't exist\n" if !host_numanode_exist($i);
>  	}

Nit: Ideally, the above would be it's own patch (or combined with
extracting the other sub suggested below)

>      }
>      return $hostnodes;

(...)

> diff --git a/test/run_config2command_tests.pl b/test/run_config2command_tests.pl
> index f097811..9b49063 100755
> --- a/test/run_config2command_tests.pl
> +++ b/test/run_config2command_tests.pl
> @@ -178,6 +178,27 @@ $qemu_server_config->mock(
>      },
>  );
>  
> +my $qemu_server_memory;
> +$qemu_server_memory = Test::MockModule->new('PVE::QemuServer::Memory');
> +$qemu_server_memory->mock(
> +    hugepages_size => sub {

Rather than mocking the whole thing, we could also extract the
sub { -d  "/sys/kernel/mm/hugepages/hugepages-". ($_[0] * 1024) ."kB" }
from the original into a named sub and only mock that. Then we'd get all
the extra logic for checks like ($size & 1023) == 0, etc.

> +	my ($conf, $size) = @_;
> +
> +	if ($conf->{hugepages} eq 'any') {
> +	    return 1024;
> +	} else {
> +	    return $conf->{hugepages};
> +	}
> +    },
> +    host_numanode_exist => sub {
> +	my ($id) = @_;
> +	return 1;
> +    },
> +    get_host_phys_address_bits => sub {
> +	return 46;
> +    }
> +);
> +
>  my $pve_common_tools;
>  $pve_common_tools = Test::MockModule->new('PVE::Tools');
>  $pve_common_tools->mock(




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser Alexandre Derumier
@ 2023-01-24 13:04   ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:04 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/API2/Qemu.pm          | 12 ++++++-
>  PVE/QemuConfig.pm         |  4 +--
>  PVE/QemuMigrate.pm        |  6 ++--
>  PVE/QemuServer.pm         | 24 ++++++-------
>  PVE/QemuServer/Helpers.pm |  3 +-
>  PVE/QemuServer/Memory.pm  | 74 ++++++++++++++++++++++++++++++++-------
>  6 files changed, 91 insertions(+), 32 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index c87602d..4ffa973 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -32,6 +32,7 @@ use PVE::QemuServer::Drive;
>  use PVE::QemuServer::ImportDisk;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
>  use PVE::QemuServer::Machine;
> +use PVE::QemuServer::Memory qw(get_current_memory);
>  use PVE::QemuMigrate;
>  use PVE::RPCEnvironment;
>  use PVE::AccessControl;
> @@ -1608,7 +1609,16 @@ my $update_vm_api  = sub {
>  	}
>  
>  	if ($param->{memory} || defined($param->{balloon})) {
> -	    my $maxmem = $param->{memory} || $conf->{pending}->{memory} || $conf->{memory} || $defaults->{memory};

Nit: seems like this was the only usage of $defaults in the whole
function, so the variable could be removed too

> +
> +	    my $maxmem = undef;
> +	    if ($param->{memory}) {
> +	        $maxmem = get_current_memory($param->{memory});
> +	    } elsif ($conf->{pending}->{memory}) {
> +	        $maxmem = get_current_memory($conf->{pending}->{memory});
> +	    } else {
> +	        $maxmem = get_current_memory($conf->{memory});
> +	    }
> +
>  	    my $balloon = defined($param->{balloon}) ? $param->{balloon} : $conf->{pending}->{balloon} || $conf->{balloon};
>  
>  	    die "balloon value too large (must be smaller than assigned memory)\n"

(...)

> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 39fc6b0..5847a78 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm

(...)

> @@ -5113,7 +5110,7 @@ sub vmconfig_hotplug_pending {
>  				     $vmid, $opt, $value, $arch, $machine_type);
>  	    } elsif ($opt =~ m/^memory$/) { #dimms
>  		die "skip\n" if !$hotplug_features->{memory};
> -		$value = PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults, $opt, $value);
> +		PVE::QemuServer::Memory::qemu_memory_hotplug($vmid, $conf, $defaults, $conf->{pending}->{$opt});

By removing the assignment to $value here, the passed-in value will be
written to the config below, which does not reflect reality. This is a
breaking change:

root@pve701 ~ # qm set 131 --memory 1024
update VM 131: -memory 1024
root@pve701 ~ # cat /etc/pve/qemu-server/131.conf | grep memory:
memory: 1024
root@pve701 ~ # qm start 131
root@pve701 ~ # qm set 131 --memory 1025
update VM 131: -memory 1025
root@pve701 ~ # cat /etc/pve/qemu-server/131.conf | grep memory:
memory: 1025

But it really is 1536, because a 512M dimm was added. And now, because
the value is wrong, the following (or with any higher value) fails:

root@pve701 ~ # qm set 131 --memory 1026
update VM 131: -memory 1026
400 Parameter verification failed.
memory: hotplug problem - VM 131 qmp command 'object-add' failed -
attempt to add duplicate property 'mem-dimm0' to object (type 'container')

qm set <vmid> [OPTIONS]

While the current behavior is arguably also a bit surprising, it
reflects the actual values, and doesn't fail:

root@pve701 ~ # qm set 131 --memory 1024
update VM 131: -memory 1024
root@pve701 ~ # cat /etc/pve/qemu-server/131.conf | grep memory:
memory: 1024
root@pve701 ~ # qm start 131
root@pve701 ~ # qm set 131 --memory 1025
update VM 131: -memory 1025
root@pve701 ~ # cat /etc/pve/qemu-server/131.conf | grep memory:
memory: 1536
root@pve701 ~ # qm set 131 --memory 1026
update VM 131: -memory 1026
try to unplug memory dimm dimm0
root@pve701 ~ # cat /etc/pve/qemu-server/131.conf | grep memory:
memory: 1024

Why not keep assigning and returning the value from
qemu_memory_hotplug() in the cases it's not already written there?

>  	    } elsif ($opt eq 'cpuunits') {
>  		my $new_cpuunits = PVE::CGroup::clamp_cpu_shares($conf->{pending}->{$opt}); #clamp
>  		$cgroup->change_cpu_shares($new_cpuunits);

(...)

> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 6c1cd94..59e51c8 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -8,10 +8,45 @@ use PVE::Exception qw(raise raise_param_exc);
>  
>  use PVE::QemuServer;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
> +use base qw(Exporter);
> +
> +our @EXPORT_OK = qw(
> +get_current_memory
> +);
>  
>  my $MAX_NUMA = 8;
>  my $STATICMEM = 1024;
>  
> +my $memory_fmt = {

Since QemuServer.pm references it, shouldn't it be 'our'?

> +    current => {
> +	description => "Current amount of online RAM for the VM in MB. This is the maximum available memory when"
> +	    ." you use the balloon device.",
> +	type => 'integer',
> +	default_key => 1,
> +	optional => 1,

Why optional?

> +	minimum => 16,
> +	default => 512,
> +    }
> +};
> +
> +sub print_memory {
> +    my $memory = shift;
> +
> +    return PVE::JSONSchema::print_property_string($memory, $memory_fmt);
> +}
> +
> +sub parse_memory {
> +    my ($value) = @_;
> +
> +    my $current_default = $memory_fmt->{current}->{default};
> +    my $res = { current => $current_default };
> +    return $res if !defined($value);

Style nit: Could be one line:
return { current => $memory_fmt->{current}->{default} } if !defined($value).

> +
> +    $res = eval { PVE::JSONSchema::parse_property_string($memory_fmt, $value) };
> +    die $@ if $@;

Style nit: No need for eval+die if the error is just passed on as-is

> +    return $res;
> +}
> +
>  my $_host_bits;
>  my sub get_host_phys_address_bits {
>      return $_host_bits if defined($_host_bits);
> @@ -63,6 +98,13 @@ my sub get_max_mem {
>      return $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
>  }
>  
> +sub get_current_memory {
> +    my ($value) = @_;
> +
> +    my $memory = parse_memory($value);
> +    return $memory->{current};
> +}
> +
>  sub get_numa_node_list {
>      my ($conf) = @_;
>      my @numa_map;
> @@ -155,17 +197,21 @@ sub foreach_reverse_dimm {
>  }
>  
>  sub qemu_memory_hotplug {
> -    my ($vmid, $conf, $defaults, $opt, $value) = @_;
> +    my ($vmid, $conf, $defaults, $value) = @_;

Removing the unused $opt from qemu_memory_hotplug() could also be its
own patch.

> +
> +    return if !PVE::QemuServer::check_running($vmid);
>  
> -    return $value if !PVE::QemuServer::check_running($vmid);
> +    my $oldmem = parse_memory($conf->{memory});
> +    my $newmem = parse_memory($value);
> +
> +    my $memory = $oldmem->{current};
> +    $value = $newmem->{current};
> +
> +    return if $value == $memory;
>  
>      my $sockets = 1;
>      $sockets = $conf->{sockets} if $conf->{sockets};
>  
> -    my $memory = $conf->{memory} || $defaults->{memory};
> -    $value = $defaults->{memory} if !$value;
> -    return $value if $value == $memory;
> -
>      my $static_memory = $STATICMEM;
>      $static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
>  

(...)

> @@ -270,7 +319,8 @@ sub qemu_dimm_list {
>  sub config {
>       my ($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd) = @_;>

The $defaults parameter can be dropped now

> -    my $memory = $conf->{memory} || $defaults->{memory};
> +    my $memory = get_current_memory($conf->{memory});
> +
>      my $static_memory = 0;
>
>      if ($hotplug_features->{memory}) {

(...)




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem Alexandre Derumier
@ 2023-01-24 13:04   ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:04 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/QemuServer/Memory.pm | 43 +++++++++++++++++++++-------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 59e51c8..e9c0115 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -15,7 +15,6 @@ get_current_memory
>  );
>  
>  my $MAX_NUMA = 8;
> -my $STATICMEM = 1024;
>  
>  my $memory_fmt = {
>      current => {
> @@ -47,6 +46,26 @@ sub parse_memory {
>      return $res;
>  }
>  
> +my sub get_static_mem {
> +    my ($conf, $defaults) = @_;

Rather than passing in $defaults, you can pass in $sockets. I think all
callers already have that available.

> +
> +    my $sockets = $conf->{sockets} || $defaults->{sockets};
> +    my $hotplug_features = PVE::QemuServer::parse_hotplug_features(defined($conf->{hotplug}) ? $conf->{hotplug} : '1');

Could also rather be passed-in as a parameter, maybe even as a simple
boolean to test for directly instead of $hotplug_features. In the longer
term, it'd be nice to get rid of the cylic usage of PVE::QemuServer and
having such calls deep in private methods doesn't help there.

> +
> +    my $static_memory = 0;
> +    my $memory = parse_memory($conf->{memory});
> +
> +    if ($hotplug_features->{memory}) {
> +	#legacy
> +	$static_memory = 1024;
> +	$static_memory = $static_memory * $sockets if ($conf->{hugepages} && $conf->{hugepages} == 1024);
> +    } else {
> +	$static_memory = $memory->{current};
> +    }
> +
> +    return $static_memory;
> +}
> +
>  my $_host_bits;
>  my sub get_host_phys_address_bits {
>      return $_host_bits if defined($_host_bits);




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option Alexandre Derumier
@ 2023-01-24 13:05   ` Fiona Ebner
  2023-01-27 15:03     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:05 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> max can be multiple of 64GB only,
> The dimm size is compute from the max memory
> 
> we can have 64 slots:
> 
> 64GB = 64 slots x 1GB
> 128GB = 64 slots x 2GB
> ..
> 4TB = 64 slots x 64GB
> 
> Also, with numa, we need to share slot between (up to 8) sockets.
> 
> 64 is a multiple of 8,
> 
> 64GB = 8 sockets * 8 slots * 1GB
> 128GB = 8 sockets * 8 slots * 2GB
> ...
> 
> and with virtio-mem,
> we can have 32000 blocks of 2M minimum
> 
> 64GB = 32000 * 2MB

The units above should all be TiB, GiB and MiB, right? The virtio-mem
documentation also talks about MiB. But then 32000 * 2MiB isn't 64 GiB.

> 
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/QemuServer/Memory.pm | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index e9c0115..1c4f356 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -5,6 +5,7 @@ use warnings;
>  
>  use PVE::Tools qw(run_command lock_file lock_file_full file_read_firstline dir_glob_foreach);
>  use PVE::Exception qw(raise raise_param_exc);
> +use PVE::GuestHelpers qw(safe_string_ne safe_num_ne safe_boolean_ne);
>  
>  use PVE::QemuServer;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
> @@ -25,7 +26,14 @@ my $memory_fmt = {
>  	optional => 1,
>  	minimum => 16,
>  	default => 512,
> -    }
> +    },
> +    max => {
> +	type => 'integer',
> +	optional => 1,
> +	type => 'integer',
> +	minimum => 65536,
> +	maximum => 4194304
> +    },
>  };
>  
>  sub print_memory {
> @@ -43,6 +51,9 @@ sub parse_memory {
>  
>      $res = eval { PVE::JSONSchema::parse_property_string($memory_fmt, $value) };
>      die $@ if $@;
> +
> +    die "max memory need to be a multiple of 64GB" if $res->{max} && $res->{max} % 65536 != 0;

s/GB/GiB/
Missing newline at the end of error message

You could also add a dedicated verify method for the format, for example
like pve_verify_hotplug_features(). Then this check is already done at
parameter verification time.

> +
>      return $res;
>  }
>  
> @@ -223,6 +234,11 @@ sub qemu_memory_hotplug {
>      my $oldmem = parse_memory($conf->{memory});
>      my $newmem = parse_memory($value);
>  
> +    # skip non hotpluggable value
> +    if (safe_num_ne($newmem->{max}, $oldmem->{max})) {
> +	die "skip\n";
> +    }

Please move this to the call sites. The "die "skip""-logic should not
cross function boundaries.

> +
>      my $memory = $oldmem->{current};
>      $value = $newmem->{current};
>  




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max
  2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max Alexandre Derumier
@ 2023-01-24 13:05   ` Fiona Ebner
  2023-01-27 15:15     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:05 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> verify than defined vm memorymax is not bigger than
> host cpu supported memory
> 
> Add add early check in update vm api
> 
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/API2/Qemu.pm         | 48 ++++++++++++++++++++++++++--------------
>  PVE/QemuServer/Memory.pm | 23 +++++++++++++++++--
>  2 files changed, 52 insertions(+), 19 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 4ffa973..cab1e84 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -32,7 +32,7 @@ use PVE::QemuServer::Drive;
>  use PVE::QemuServer::ImportDisk;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
>  use PVE::QemuServer::Machine;
> -use PVE::QemuServer::Memory qw(get_current_memory);
> +use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem);
>  use PVE::QemuMigrate;
>  use PVE::RPCEnvironment;
>  use PVE::AccessControl;
> @@ -476,6 +476,35 @@ my $create_disks = sub {
>      return ($vollist, $res);
>  };
>  
> +my $check_memory_param = sub {
> +    my ($conf, $param) = @_;
> +
> +    my $mem = parse_memory($param->{memory});
> +    my $host_max_mem = get_host_max_mem($conf);
> +
> +    if ($mem->{max}) {
> +	die "memory max can't be bigger than supported cpu architecture $host_max_mem MB\n"

s/MB/MiB

> +	    if $mem->{max} > $host_max_mem;

Style nit: you could && both conditions to save lines

This could be part of the verifier sub suggested in the last patch

> +    }
> +

(...)

> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 1c4f356..20b9bf9 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -13,6 +13,8 @@ use base qw(Exporter);
>  
>  our @EXPORT_OK = qw(
>  get_current_memory
> +parse_memory
> +get_host_max_mem
>  );
>  
>  my $MAX_NUMA = 8;
> @@ -94,7 +96,7 @@ my sub get_host_phys_address_bits {
>      return; # undef, cannot really do anything..
>  }
>  
> -my sub get_max_mem {
> +sub get_host_max_mem {
>      my ($conf) = @_;
>  
>      my $cpu = {};
> @@ -125,7 +127,24 @@ my sub get_max_mem {
>      # heuristic: remove 20 bits to get MB and half that as QEMU needs some overhead
>      my $bits_to_max_mem = int(1<<($bits - 21));
>  
> -    return $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
> +    my $host_max_mem = $bits_to_max_mem > 4*1024*1024 ? 4*1024*1024 : $bits_to_max_mem;
> +
> +    return $host_max_mem;
> +}

Nit: useless change?

> +
> +my sub get_max_mem {
> +    my ($conf) = @_;
> +
> +    my $host_max_mem = get_host_max_mem($conf);
> +    my $mem = parse_memory($conf->{memory});
> +
> +    if ($mem->{max}) {
> +	die "configured memory max can't be bigger than supported cpu architecture $host_max_mem MB\n"

s/MB/MiB/

> +	    if $mem->{max} > $host_max_mem;
> +	return $mem->{max};
> +    }
> +
> +    return $host_max_mem;
>  }
>  
>  sub get_current_memory {




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
@ 2023-01-24 13:06   ` Fiona Ebner
  2023-01-27 15:52     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:06 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
> @@ -185,14 +191,15 @@ sub foreach_dimm{
>      my ($conf, $vmid, $memory, $sockets, $func) = @_;
>  
>      my $dimm_id = 0;
> -    my $current_size = 0;
> +    my $current_size = get_static_mem($conf);

Nit: Using the new method could/should be part of patch 3/9 already

>      my $dimm_size = 0;
>  
> -    if($conf->{hugepages} && $conf->{hugepages} == 1024) {
> -	$current_size = 1024 * $sockets;
> +    my $confmem = parse_memory($conf->{memory});
> +    if ($confmem->{max}) {
> +	$dimm_size = $confmem->{max} / $MAX_SLOTS;
> +    } elsif($conf->{hugepages} && $conf->{hugepages} == 1024) {
>  	$dimm_size = 1024;
>      } else {
> -	$current_size = 1024;
>  	$dimm_size = 512;
>      }
>  

Question about the existing code: The loops below can count up to
$dimm_id 255, but in the commit message you say that there are at most
255 slots (so the highest ID is 254?). But yeah, it only becomes
relevant when going all the way to approximately 4 TiB.

> @@ -209,7 +216,7 @@ sub foreach_dimm{
>  	    &$func($conf, $vmid, $name, $dimm_size, $numanode, $current_size, $memory);
>  	    return  $current_size if $current_size >= $memory;
>  	}
> -	$dimm_size *= 2;
> +	$dimm_size *= 2 if !$confmem->{max};
>      }
>  }
>  
> @@ -220,7 +227,12 @@ sub foreach_reverse_dimm {

Question about the existing code: There is
my $dimm_id = 253;
Shouldn't that start at 254 (highest valid ID we can count up to?).
Again only becomes relevant with a lot of memory.

>      my $current_size = 0;
>      my $dimm_size = 0;
>  
> -    if($conf->{hugepages} && $conf->{hugepages} == 1024) {
> +    my $confmem = parse_memory($conf->{memory});
> +    if ($confmem->{max}) {
> +	$dimm_id = $MAX_SLOTS - 1;
> +	$current_size = $confmem->{max};

Does this need to be $confmem->{max} + $static_size? See below for a
description of the issue. Didn't think about it in detail, so please
double check ;)

> +	$dimm_size = $confmem->{max} / $MAX_SLOTS;
> +    } elsif ($conf->{hugepages} && $conf->{hugepages} == 1024) {
>  	$current_size = 8355840;
>  	$dimm_size = 131072;
>      } else {

Nit: the loops below here are
    for (my $j = 0; $j < 8; $j++) {
        for (my $i = 0; $i < 32; $i++) {
so it looks like potentially iterating more often than $MAX_SLOTS and
reaching negative $dimm_ids. I know that we should always return from
the loop earlier than that, but maybe it can be improved by extracting
the inner part in a sub/closure and using different loops depending on
how many slots there are? Same applies to foreach_dimm().


Real issue: something is wrong with the calculation for unplugging in
combination with 'max' (it uses the wrong dimm IDs):

> root@pve701 ~ # cat qmpqga.pm                    
> #!/bin/perl
> 
> use strict;
> use warnings;
> 
> use Data::Dumper;
> $Data::Dumper::Sortkeys = 1;
> use Time::HiRes qw(usleep ualarm gettimeofday tv_interval);
> 
> use PVE::QemuServer::Monitor qw(mon_cmd);
> 
> my $vmid = shift or die "need to specify vmid\n";
> 
> my $res = eval { mon_cmd($vmid, "query-memory-devices") };
> warn $@ if $@;
> for my $dimm ($res->@*) {
>     my ($id, $size) = $dimm->{data}->@{qw(id size)};
>     print "$id: $size\n";
> }
> $res = eval { mon_cmd($vmid, "query-memory-size-summary") };
> warn $@ if $@;
> print Dumper($res);
> 

> root@pve701 ~ # qm set 131 --memory 4096,max=65536
> update VM 131: -memory 4096,max=65536
> root@pve701 ~ # qm start 131
> root@pve701 ~ # perl qmpqga.pm 131
> $VAR1 = {
>           'base-memory' => 4294967296,
>           'plugged-memory' => 0
>         };
> root@pve701 ~ # qm set 131 --memory 8192,max=65536
> update VM 131: -memory 8192,max=65536
> root@pve701 ~ # perl qmpqga.pm 131
> dimm0: 1073741824
> dimm1: 1073741824
> dimm2: 1073741824
> dimm3: 1073741824
> $VAR1 = {
>           'base-memory' => 4294967296,
>           'plugged-memory' => 4294967296
>         };
> root@pve701 ~ # qm set 131 --memory 4096,max=65536
> update VM 131: -memory 4096,max=65536
> try to unplug memory dimm dimm7
> try to unplug memory dimm dimm6
> try to unplug memory dimm dimm5
> try to unplug memory dimm dimm4

Those are the wrong IDs, so the memory stays plugged.

> root@pve701 ~ # perl qmpqga.pm 131
> dimm0: 1073741824
> dimm1: 1073741824
> dimm2: 1073741824
> dimm3: 1073741824
> $VAR1 = {
>           'base-memory' => 4294967296,
>           'plugged-memory' => 4294967296
>         };





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests Alexandre Derumier
@ 2023-01-24 13:06   ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:06 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
> diff --git a/test/cfg2cmd/memory-max-128G.conf.cmd b/test/cfg2cmd/memory-max-128G.conf.cmd
> new file mode 100644
> index 0000000..4e2c06d
> --- /dev/null
> +++ b/test/cfg2cmd/memory-max-128G.conf.cmd
> @@ -0,0 +1,86 @@

(...)

> +  -object 'memory-backend-ram,id=mem-dimm27,size=1024M' \
> +  -device 'pc-dimm,id=dimm27,memdev=mem-dimm27,node=1' \
> +   -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \

Nit: extra space (doesn't break the test though ;))

> +  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
> +  -device 'vmgenid,guid=c773c261-d800-4348-9f5d-167fadd53cf8' \
> +  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
> +  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
> +  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
> +  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' \
> +  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:aabbccddeeff' \
> +  -machine 'type=pc+pve0'
> \ No newline at end of file




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support Alexandre Derumier
@ 2023-01-24 13:06   ` Fiona Ebner
  2023-01-25  9:00     ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:06 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
> a 4GB static memory is needed for DMA+boot memory, as this memory
> is almost always un-unpluggeable.
> 
> 1 virtio-mem pci device is setup for each numa node on pci.4 bridge
> 
> virtio-mem use a fixed blocksize with 32000 blocks
> Blocksize is computed from the maxmemory-4096/32000 with a minimum of
> 2MB to map THP.
> (lower blocksize = more chance to unplug memory).
> 
> Note: Currently, linux only support 4MB virtio blocksize, 2MB support
> is currently is progress.
> 

For the above paragraphs:
s/GB/GiB/
s/MB/MiB/
?

> For hotplug/unplug, we are try to allocate/unallocate same amount
> of memory aligned to the blocksize on each numa node if possible.
> If a node a not able to reach the target memory (could be an unmovable
> page on unplug for example), we try again to redispatch memory the
> remaining memory on others nodes.
> 
> About hugepages:
> 
> For ordinary memory devices, such as DIMMs, we preallocate memory via the
> memory backend for such use cases; however, with virtio-mem we're dealing
> with sparse memory backends; preallocating the whole memory backend
> destroys the whole purpose of virtio-mem.
> 
> Instead, we want to preallocate memory when actually exposing memory to the
> VM dynamically, and fail plugging memory gracefully + warn the user in case
> preallocation fails.
> 
> fixes:
> https://bugzilla.proxmox.com/show_bug.cgi?id=931

As said last time, one user in the report has Windows, which doesn't
support virtio-mem, so I wouldn't consider the issue fixed.

> https://bugzilla.proxmox.com/show_bug.cgi?id=2949
> ---
> 
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  PVE/API2/Qemu.pm         |  10 +-
>  PVE/QemuServer.pm        |   7 +-
>  PVE/QemuServer/Memory.pm | 233 ++++++++++++++++++++++++++++++++++++---
>  PVE/QemuServer/PCI.pm    |   8 ++
>  4 files changed, 242 insertions(+), 16 deletions(-)
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index cab1e84..42941ac 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -32,7 +32,7 @@ use PVE::QemuServer::Drive;
>  use PVE::QemuServer::ImportDisk;
>  use PVE::QemuServer::Monitor qw(mon_cmd);
>  use PVE::QemuServer::Machine;
> -use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem);
> +use PVE::QemuServer::Memory qw(get_current_memory parse_memory get_host_max_mem get_virtiomem_block_size);
>  use PVE::QemuMigrate;
>  use PVE::RPCEnvironment;
>  use PVE::AccessControl;
> @@ -487,6 +487,14 @@ my $check_memory_param = sub {
>  	    if $mem->{max} > $host_max_mem;
>      }
>  
> +    #unplug works better with 128MB by dimm to match the linux blocksize btyes.

s/MB/MiB/
s/btyes/bytes/

> +    if ($mem->{virtio}) {
> +	my $blocksize = get_virtiomem_block_size($conf);
> +
> +	die "memory need to be a multiple of $blocksize MB when virtiomem is enabled\n"

s/MB/MiB/
s/need/needs/

> +	    if $mem->{current} % $blocksize != 0;
> +    }
> +
>      if ($param->{memory} || defined($param->{balloon})) {
>  
>  	my $maxmem = undef;
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 5847a78..51b29fc 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -3857,7 +3857,12 @@ sub config_to_command {
>  	push @$cmd, get_cpu_options($conf, $arch, $kvm, $kvm_off, $machine_version, $winversion, $gpu_passthrough);
>      }
>  
> -    PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd);
> +    my $mem_devices = {};
> +    PVE::QemuServer::Memory::config($conf, $vmid, $sockets, $cores, $defaults, $hotplug_features, $cmd, $mem_devices);
> +    foreach my $id (sort keys %$mem_devices) {
> +	my $pciaddr = print_pci_addr($id, $bridges, $arch, $machine_type);
> +	push @$devices, "-device", "$mem_devices->{$id}$pciaddr";
> +    }
>  
>      push @$cmd, '-S' if $conf->{freeze};
>  
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index b9136d2..6827004 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -3,6 +3,8 @@ package PVE::QemuServer::Memory;
>  use strict;
>  use warnings;
>  
> +use POSIX qw/ceil/;

Style nit: haven't seen this variant yet, usually qw() is used in our
codebase

> +
>  use PVE::Tools qw(run_command lock_file lock_file_full file_read_firstline dir_glob_foreach);
>  use PVE::Exception qw(raise raise_param_exc);
>  use PVE::GuestHelpers qw(safe_string_ne safe_num_ne safe_boolean_ne);
> @@ -15,6 +17,7 @@ our @EXPORT_OK = qw(
>  get_current_memory
>  parse_memory
>  get_host_max_mem
> +get_virtiomem_block_size
>  );
>  
>  my $MAX_NUMA = 8;
> @@ -37,6 +40,12 @@ my $memory_fmt = {
>  	minimum => 65536,
>  	maximum => 4194304
>      },
> +    virtio => {
> +	description => "enable virtio-mem memory",

Again, we really should note that the feature is a technology preview
and only supported on recent enough Linux kernels.

> +	type => 'boolean',
> +	optional => 1,
> +	default => 0,
> +    }
>  };
>  
>  sub print_memory {
> @@ -69,7 +78,9 @@ my sub get_static_mem {
>      my $static_memory = 0;
>      my $memory = parse_memory($conf->{memory});
>  
> -    if ($memory->{max}) {
> +    if ($memory->{virtio}) {
> +	$static_memory = 4096;
> +    } elsif ($memory->{max}) {
>  	my $dimm_size = $memory->{max} / $MAX_SLOTS;
>  	#static mem can't be lower than 4G and lower than 1 dimmsize by socket
>  	$static_memory = $dimm_size * $sockets;
> @@ -160,6 +171,134 @@ sub get_current_memory {
>      return $memory->{current};
>  }
>  
> +sub get_virtiomem_block_size {
> +    my ($conf) = @_;
> +
> +    my $MAX_MEM = get_max_mem($conf);
> +    my $static_memory = get_static_mem($conf);
> +    my $memory = get_current_memory($conf->{memory});
> +
> +    #virtiomem can map 32000 block size.
> +    #try to use lowest blocksize, lower = more chance to unplug memory.
> +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> +    #2MB is the minimum to be aligned with THP
> +    $blocksize = 2**(ceil(log($blocksize)/log(2)));
> +    $blocksize = 4 if $blocksize < 4;

Why suddenly 4?

As discussed last time, the future-proof way to calculate this (if we
want minimum 2) without risking any negative result for log($blocksize), is
my $blocksize = ($MAX_MEM - $static_memory) / 32000;
$blocksize = 2 if $blocksize < 2;
$blocksize = 2**(ceil(log($blocksize)/log(2)));

> +
> +    return $blocksize;
> +}
> +
> +my sub get_virtiomem_total_current {
> +    my ($mems) = @_;
> +    my $total = 0;
> +    foreach my $id (keys %$mems) {

Style nit: use for instead of foreach (same for all others below)
Nit: can iterate over values instead of keys

Same for the two functions below.

> +	my $mem = $mems->{$id};
> +	$total += $mem->{current};
> +    }
> +    return $total;

Below, you use total for a count, not a size. This is confusing naming.
Here, size should be used.

> +}
> +
> +my sub get_virtiomem_total_noerror {
> +    my ($mems) = @_;
> +
> +    my $total = 0;
> +    foreach my $id (keys %$mems) {
> +	my $mem = $mems->{$id};
> +	next if $mem->{error};
> +	$total++;
> +    }
> +    return $total;

I think the whole function could be
scalar(grep { !$mem->{error} } values $mems->%*)
and since there's only one caller, you can inline it there.

> +}
> +
> +my sub get_virtiomem_total_errors_size {
> +    my ($mems) = @_;
> +
> +    my $size = 0;
> +    foreach my $id (keys %$mems) {
> +	my $mem = $mems->{$id};
> +	next if !$mem->{error};
> +	$size += $mem->{current};
> +    }
> +    return $size;
> +}
> +
> +my sub balance_virtiomem {

This function is rather difficult to read. The "record errors and
filter" logic really should be its own patch after the initial support.
FWIW, I tried my best and it does seems fine :)

But it's not clear to me that we even want that logic? Is it really that
common for qom-set to take so long to be worth introducing all this
additional handling/complexity? Or should it just be a hard error if
qom-set still didn't have an effect on a device after 5 seconds.

Would it actually be better to just fill up the first, then the second
etc. as needed, rather than balancing? My gut feeling is that having
fewer "active" devices is better. But this would have to be tested with
some benchmarks of course.

> +    my ($vmid, $virtiomems, $blocksize, $target_virtiomem_total) = @_;
> +
> +    my $virtiomem_total_noerror = get_virtiomem_total_noerror($virtiomems);
> +
> +    print"try to balance memory on $virtiomem_total_noerror remaining virtiomems\n";

Style nit: missing space after print (same for some others below, also
missing space after if a few times)

> +
> +    die "error. no more available blocks in virtiomem to balance the remaining memory" if $target_virtiomem_total < 0;
> +    die "error. No more available virtiomem to balance the remaining memory\n" if $virtiomem_total_noerror == 0;

Style nit: lines > 100 characters (same for some others below).




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests Alexandre Derumier
@ 2023-01-24 13:08   ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:08 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  test/cfg2cmd/memory-virtio-hugepages-1G.conf  | 12 +++++++
>  .../memory-virtio-hugepages-1G.conf.cmd       | 35 +++++++++++++++++++
>  test/cfg2cmd/memory-virtio-max.conf           | 11 ++++++
>  test/cfg2cmd/memory-virtio-max.conf.cmd       | 35 +++++++++++++++++++
>  test/cfg2cmd/memory-virtio.conf               | 11 ++++++
>  test/cfg2cmd/memory-virtio.conf.cmd           | 35 +++++++++++++++++++
>  6 files changed, 139 insertions(+)
>  create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf
>  create mode 100644 test/cfg2cmd/memory-virtio-hugepages-1G.conf.cmd
>  create mode 100644 test/cfg2cmd/memory-virtio-max.conf
>  create mode 100644 test/cfg2cmd/memory-virtio-max.conf.cmd
>  create mode 100644 test/cfg2cmd/memory-virtio.conf
>  create mode 100644 test/cfg2cmd/memory-virtio.conf.cmd
> 
> diff --git a/test/cfg2cmd/memory-virtio-hugepages-1G.conf b/test/cfg2cmd/memory-virtio-hugepages-1G.conf
> new file mode 100644
> index 0000000..f28c7f8
> --- /dev/null
> +++ b/test/cfg2cmd/memory-virtio-hugepages-1G.conf
> @@ -0,0 +1,12 @@
> +# TEST: virtio-mem with memorymax defined

s/with/without/ or s/defined/undefined/

> +# QEMU_VERSION: 7.2
> +cores: 2
> +memory: 32768,virtio=1
> +name: simple
> +numa: 1
> +ostype: l26
> +smbios1: uuid=7b10d7af-b932-4c66-b2c3-3996152ec465
> +sockets: 2
> +vmgenid: c773c261-d800-4348-9f5d-167fadd53cf8
> +hugepages: 1024
> +hotplug: memory
> \ No newline at end of file




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem
  2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
                   ` (8 preceding siblings ...)
  2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests Alexandre Derumier
@ 2023-01-24 13:08 ` Fiona Ebner
  9 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-24 13:08 UTC (permalink / raw)
  To: pve-devel, aderumier

Am 04.01.23 um 07:42 schrieb Alexandre Derumier:
> This patch series rework the current memory hotplug + virtiomem.
> 
> memory option now have extra options:
> 
> memory: [[current=]<integer>] [,max=<enum>] [,virtio=<1|0>]
> ex: memory: current=1024,max=131072,virtio=1
> 
> 
> patches 1-2: add a memory parser
> 
> patches 3-7: add the max option with 64 static dimm hotplug
> 
> for classic memory hotplug, when maxmemory is defined,
> we use 64 fixed size dimm.
> The max option is a multiple of 64GB.
> 
> patches 8-9: add virtio-mem
> 
> The virtio option enable new virtio-mem support,
> instead of plugging dimm, it's add/removed block inside
> big dimm.
> virtio-mem can use 32000 blocks, the blocksize is compute from
> max memory.
> 
> 
> Changelog v2:
> 
> update differents patches based on Fiona comments.
> (I have send 2 others mails with comments not yet addressed)
> 
> Biggest change is on virtio-mem, instead of trying to have same amount of memory
> on each virtiomem (some block could be unmovable and break unplug),
> we try to balance/dispatch remaining block on other available virtiomems.
> 
> Also, the minimum blocksize supported by linux guest os is 4MB currently,
> even if virtiomem can use 2MB on qemu side.
> 
> Patch10 with hotplug fix has be merged in others patches.

Thank you for tackling this! Most of my comments are nits/suggestions
for improvements, but there are a few real issues I found. The issues
and some of the more important suggestions (details in the patches):

Patch 2/9 does a breaking change by changing with what value gets
written to the config.

In patch 4/9 the die "skip"-logic really should be at the call side instead.

In patch 6/9 something is wrong with the calculation for unplugging in
combination with 'max' (it uses the wrong dimm IDs).

In patch 8/9 you really should mention that virtio-mem is a technology
preview. Also the whole "retry handling/marking as error" logic is a bit
much, and would ideally be its own patch, if we even want this much
complexity.

Missing adaptation for ha-manager.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-24 13:06   ` Fiona Ebner
@ 2023-01-25  9:00     ` DERUMIER, Alexandre
  2023-01-25  9:54       ` Fiona Ebner
  0 siblings, 1 reply; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-25  9:00 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

Le mardi 24 janvier 2023 à 14:06 +0100, Fiona Ebner a écrit :
> Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
> > a 4GB static memory is needed for DMA+boot memory, as this memory
> > is almost always un-unpluggeable.
> > 
> > 1 virtio-mem pci device is setup for each numa node on pci.4 bridge
> > 
> > virtio-mem use a fixed blocksize with 32000 blocks
> > Blocksize is computed from the maxmemory-4096/32000 with a minimum
> > of
> > 2MB to map THP.
> > (lower blocksize = more chance to unplug memory).
> > 
> > Note: Currently, linux only support 4MB virtio blocksize, 2MB
> > support
> > is currently is progress.
> > 
> 
> For the above paragraphs:
> s/GB/GiB/
> s/MB/MiB/
> ?

yes, I'll fix it in all patches

(...)

> >  
> > +sub get_virtiomem_block_size {
> > +    my ($conf) = @_;
> > +
> > +    my $MAX_MEM = get_max_mem($conf);
> > +    my $static_memory = get_static_mem($conf);
> > +    my $memory = get_current_memory($conf->{memory});
> > +
> > +    #virtiomem can map 32000 block size.
> > +    #try to use lowest blocksize, lower = more chance to unplug
> > memory.
> > +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> > +    #2MB is the minimum to be aligned with THP
> > +    $blocksize = 2**(ceil(log($blocksize)/log(2)));
> > +    $blocksize = 4 if $blocksize < 4;
> 
> Why suddenly 4?

I have added a note in the commit :

> Note: Currently, linux only support 4MB virtio blocksize, 2MB support
> is currently is progress.
> 

So 2MB is valid from qemu side, but linux guest kernel don't support it
actually. At least , you need to use multiple of 4MB. you can
remove/add 2 blocks of 2MB at the same time, but it don't seem to be
atomic, so I think it's better to use the minimum currently supported
bloc.
Maybe later, we could extend the virtio=X option, to tell the virtio
supported version.  (virtio=1.1  , virtio=1.2), and enable supported
features ?
 
 

> > +my sub balance_virtiomem {
> 
> This function is rather difficult to read. The "record errors and
> filter" logic really should be its own patch after the initial
> support.
> FWIW, I tried my best and it does seems fine :)
> 
> But it's not clear to me that we even want that logic? Is it really
> that
> common for qom-set to take so long to be worth introducing all this
> additional handling/complexity? Or should it just be a hard error if
> qom-set still didn't have an effect on a device after 5 seconds.
> 
from my test,It can take 2-3second on unplug on bigger setup. I'm doing
it in // to be faster, to avoid to wait nbdimm * 2-3seconds.

> Would it actually be better to just fill up the first, then the
> second
> etc. as needed, rather than balancing? My gut feeling is that having
> fewer "active" devices is better. But this would have to be tested
> with
> some benchmarks of course.

Well, from numa perspective, you really want to balance as much as
possible. (That's why, with classic hotplug, we add/remove dimm on each
socket alternatively).

That's the whole point of numa, read the nearest memory attached to the
processor where the process are running.

That's a main advantage of virtio-mem  vs balloning (which doesn't
handle numa, and remove pages randomly on any socket)






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-25  9:00     ` DERUMIER, Alexandre
@ 2023-01-25  9:54       ` Fiona Ebner
  2023-01-25 10:28         ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-25  9:54 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel, aderumier

Am 25.01.23 um 10:00 schrieb DERUMIER, Alexandre:
> Le mardi 24 janvier 2023 à 14:06 +0100, Fiona Ebner a écrit :
>> Am 04.01.23 um 07:43 schrieb Alexandre Derumier:
>>> +sub get_virtiomem_block_size {
>>> +    my ($conf) = @_;
>>> +
>>> +    my $MAX_MEM = get_max_mem($conf);
>>> +    my $static_memory = get_static_mem($conf);
>>> +    my $memory = get_current_memory($conf->{memory});
>>> +
>>> +    #virtiomem can map 32000 block size.
>>> +    #try to use lowest blocksize, lower = more chance to unplug
>>> memory.
>>> +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;
>>> +    #2MB is the minimum to be aligned with THP
>>> +    $blocksize = 2**(ceil(log($blocksize)/log(2)));
>>> +    $blocksize = 4 if $blocksize < 4;
>>
>> Why suddenly 4?
> 
> I have added a note in the commit :
> 
>> Note: Currently, linux only support 4MB virtio blocksize, 2MB support
>> is currently is progress.
>>
> 
> So 2MB is valid from qemu side, but linux guest kernel don't support it
> actually. At least , you need to use multiple of 4MB. you can
> remove/add 2 blocks of 2MB at the same time, but it don't seem to be
> atomic, so I think it's better to use the minimum currently supported
> bloc.

Sorry, I missed that.

> Maybe later, we could extend the virtio=X option, to tell the virtio
> supported version.  (virtio=1.1  , virtio=1.2), and enable supported
> features ?

If we really need to and can't detect it otherwise, sure.

>>> +my sub balance_virtiomem {
>>
>> This function is rather difficult to read. The "record errors and
>> filter" logic really should be its own patch after the initial
>> support.
>> FWIW, I tried my best and it does seems fine :)
>>
>> But it's not clear to me that we even want that logic? Is it really
>> that
>> common for qom-set to take so long to be worth introducing all this
>> additional handling/complexity? Or should it just be a hard error if
>> qom-set still didn't have an effect on a device after 5 seconds.
>>
> from my test,It can take 2-3second on unplug on bigger setup. I'm doing
> it in // to be faster, to avoid to wait nbdimm * 2-3seconds.

Sure, doing it in parallel is perfectly fine. I'm just thinking that
switching gears (too early) and redirecting the request might not be
ideal. You also issue an additional qom-set to go back to
$virtiomem->{current} * 1024 *1024 if a request didn't make progress in
time. But to be sure that request worked, we'd also need to monitor it
;) I think issuing that request is fine as-is, but if there is a
"hanging" device, we really can't do much. And I do think the user
should be informed via an appropriate error if there is a problematic
device.

Maybe we can use 10 seconds instead of 5 (2-3 seconds already sounds too
close to 5 IMHO), so that we have a good margin, and just die instead of
trying to redirect the request to another device. After issuing the
reset request and writing our best guess of what the memory is to the
config of course.

If it really is an issue in practice that certain devices often take too
long for whatever reason, we can still add the redirect logic. But
starting out, I feel like it's not worth the additional complexity.

>> Would it actually be better to just fill up the first, then the
>> second
>> etc. as needed, rather than balancing? My gut feeling is that having
>> fewer "active" devices is better. But this would have to be tested
>> with
>> some benchmarks of course.
> 
> Well, from numa perspective, you really want to balance as much as
> possible. (That's why, with classic hotplug, we add/remove dimm on each
> socket alternatively).
> 
> That's the whole point of numa, read the nearest memory attached to the
> processor where the process are running.
> 
> That's a main advantage of virtio-mem  vs balloning (which doesn't
> handle numa, and remove pages randomly on any socket)

Makes sense. Thanks for the explanation!




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-25  9:54       ` Fiona Ebner
@ 2023-01-25 10:28         ` DERUMIER, Alexandre
  2023-01-25 10:52           ` Fiona Ebner
  0 siblings, 1 reply; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-25 10:28 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

> 
> Sure, doing it in parallel is perfectly fine. I'm just thinking that
> switching gears (too early) and redirecting the request might not be
> ideal. You also issue an additional qom-set to go back to
> $virtiomem->{current} * 1024 *1024 if a request didn't make progress
> in
> time. But to be sure that request worked, we'd also need to monitor
> it
> ;) I think issuing that request is fine as-is, but if there is a
> "hanging" device, we really can't do much. And I do think the user
> should be informed via an appropriate error if there is a problematic
> device.
> 
> Maybe we can use 10 seconds instead of 5 (2-3 seconds already sounds
> too
> close to 5 IMHO), so that we have a good margin, and just die instead
> of
> trying to redirect the request to another device. After issuing the
> reset request and writing our best guess of what the memory is to the
> config of course.
> 
I forgot to say, than it don't timeout 5s after the qom-set,
but timeout after 5s if no memory change is detected by qom-get. 
I'm reseting the retry counter if a change is detected.
(so 5s is really big, in real, when it's blocking for 1s, it's really
blocking)


if($virtiomem->{current} != $virtiomem->{last}) {
                #if value has changed, but not yet completed
                print "virtiomem$id: changed but don't not reach target
yet\n";
                $virtiomem->{retry} = 0;
                $virtiomem->{last} = $virtiomem->{current};
                next;

}

> If it really is an issue in practice that certain devices often take
> too
> long for whatever reason, we can still add the redirect logic. But
> starting out, I feel like it's not worth the additional complexity.
> 
The real only reason will be on unplug, if memory block are unmovable
(kernel reserved) or with big fragmentation, no available block.(with
4MB granurality it's difficult to reach, but with bigger maxmem and
bigger block, we have more chance to trigger it.  (with 1GB
hugepage,it's more easy to trigger it too I think)


But if you want something more simple,

I can something like before, split memory by number of sockets,
and if we have an error on 1 socket, don't try to redispatch again
remaining block of this socket on other nodes.


> > > Would it actually be better to just fill up the first, then the
> > > second
> > > etc. as needed, rather than balancing? My gut feeling is that
> > > having
> > > fewer "active" devices is better. But this would have to be
> > > tested
> > > with
> > > some benchmarks of course.
> > 
> > Well, from numa perspective, you really want to balance as much as
> > possible. (That's why, with classic hotplug, we add/remove dimm on
> > each
> > socket alternatively).
> > 
> > That's the whole point of numa, read the nearest memory attached to
> > the
> > processor where the process are running.
> > 
> > That's a main advantage of virtio-mem  vs balloning (which doesn't
> > handle numa, and remove pages randomly on any socket)
> 
> Makes sense. Thanks for the explanation!
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support
  2023-01-25 10:28         ` DERUMIER, Alexandre
@ 2023-01-25 10:52           ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-25 10:52 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel, aderumier

Am 25.01.23 um 11:28 schrieb DERUMIER, Alexandre:
>> Sure, doing it in parallel is perfectly fine. I'm just thinking that
>> switching gears (too early) and redirecting the request might not be
>> ideal. You also issue an additional qom-set to go back to
>> $virtiomem->{current} * 1024 *1024 if a request didn't make progress
>> in
>> time. But to be sure that request worked, we'd also need to monitor
>> it
>> ;) I think issuing that request is fine as-is, but if there is a
>> "hanging" device, we really can't do much. And I do think the user
>> should be informed via an appropriate error if there is a problematic
>> device.
>>
>> Maybe we can use 10 seconds instead of 5 (2-3 seconds already sounds
>> too
>> close to 5 IMHO), so that we have a good margin, and just die instead
>> of
>> trying to redirect the request to another device. After issuing the
>> reset request and writing our best guess of what the memory is to the
>> config of course.
>>
> I forgot to say, than it don't timeout 5s after the qom-set,
> but timeout after 5s if no memory change is detected by qom-get. 
> I'm reseting the retry counter if a change is detected.
> (so 5s is really big, in real, when it's blocking for 1s, it's really
> blocking)

I saw that and thought the "It can take 2-3second on unplug on bigger
setup" means that it can take this long for a change to happen. Since
that's not the case, 5 seconds can be fine of course :)

>> If it really is an issue in practice that certain devices often take
>> too
>> long for whatever reason, we can still add the redirect logic. But
>> starting out, I feel like it's not worth the additional complexity.
>>
> The real only reason will be on unplug, if memory block are unmovable
> (kernel reserved) or with big fragmentation, no available block.(with
> 4MB granurality it's difficult to reach, but with bigger maxmem and
> bigger block, we have more chance to trigger it.  (with 1GB
> hugepage,it's more easy to trigger it too I think)
>
> But if you want something more simple,
> 
> I can something like before, split memory by number of sockets,
> and if we have an error on 1 socket, don't try to redispatch again
> remaining block of this socket on other nodes.
Hope it's not too much work. I do think we should wait for some user
demand before adding the redispatch feature (and having it as a separate
patch helps to review and for git history). But feel free to include it
if you already experienced the above-mentioned issue a few times ;)




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option
  2023-01-24 13:05   ` Fiona Ebner
@ 2023-01-27 15:03     ` DERUMIER, Alexandre
  2023-01-30  8:03       ` Fiona Ebner
  0 siblings, 1 reply; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-27 15:03 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

> 
> > +    # skip non hotpluggable value
> > +    if (safe_num_ne($newmem->{max}, $oldmem->{max})) {
> > +       die "skip\n";
> > +    }
> 
> Please move this to the call sites. The "die "skip""-logic should not
> cross function boundaries.
> 
> 
Just a note: This is exactly how it's done on nic && disk hotplug.

for example:
vmconfig_update_disk {
 ...
    # skip non hotpluggable value
   if (safe_string_ne($drive->{discard}, $old_drive->{discard}) ||
        safe_string_ne($drive->{iothread}, $old_drive->{iothread}) ||
        safe_string_ne($drive->{queues}, $old_drive->{queues}) ||
        safe_string_ne($drive->{cache}, $old_drive->{cache}) ||
        safe_string_ne($drive->{ssd}, $old_drive->{ssd}) ||
        safe_string_ne($drive->{ro}, $old_drive->{ro})) {
             die "skip\n";
   }




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max
  2023-01-24 13:05   ` Fiona Ebner
@ 2023-01-27 15:15     ` DERUMIER, Alexandre
  2023-01-30  8:04       ` Fiona Ebner
  0 siblings, 1 reply; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-27 15:15 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

Le mardi 24 janvier 2023 à 14:05 +0100, Fiona Ebner a écrit :
> > +my $check_memory_param = sub {
> > +    my ($conf, $param) = @_;
> > +
> > +    my $mem = parse_memory($param->{memory});
> > +    my $host_max_mem = get_host_max_mem($conf);
> > +
> > +    if ($mem->{max}) {
> > +       die "memory max can't be bigger than supported cpu
> > architecture $host_max_mem MB\n"
> 
> s/MB/MiB
> 
> > +           if $mem->{max} > $host_max_mem;
> 
> Style nit: you could && both conditions to save lines
> 
> This could be part of the verifier sub suggested in the last patch

I'm not sure we can use the verifier here,

as we need to retrieve $host_max_mem from the current configuration
with get_host_max_mem($conf).


This current $check_memory_param, in API2::Qemu , it's done in
$updatefn, after the lock on the config.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined
  2023-01-24 13:06   ` Fiona Ebner
@ 2023-01-27 15:52     ` DERUMIER, Alexandre
  0 siblings, 0 replies; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-27 15:52 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

> 
> Question about the existing code: The loops below can count up to
> $dimm_id 255, but in the commit message you say that there are at
> most
> 255 slots (so the highest ID is 254?). But yeah, it only becomes
> relevant when going all the way to approximately 4 TiB.
yes, the max slot is 255 (id0->id254).
If I remember (2015 ^_^), the last iteration (dimm id 255) of the 8x32
loop, was over the value supported by qemu or the conf (because of the
static memory), so it always return .

> 
> > @@ -209,7 +216,7 @@ sub foreach_dimm{
> >             &$func($conf, $vmid, $name, $dimm_size, $numanode,
> > $current_size, $memory);
> >             return  $current_size if $current_size >= $memory;
> >         }
> > -       $dimm_size *= 2;
> > +       $dimm_size *= 2 if !$confmem->{max};
> >      }
> >  }
> >  
> > @@ -220,7 +227,12 @@ sub foreach_reverse_dimm {
> 
> Question about the existing code: There is
> my $dimm_id = 253;
> Shouldn't that start at 254 (highest valid ID we can count up to?).
> Again only becomes relevant with a lot of memory.
> 
mmm, I really didn't remember. I need to double check, but I think it
should 254 indeed.

> >      my $current_size = 0;
> >      my $dimm_size = 0;
> >  
> > -    if($conf->{hugepages} && $conf->{hugepages} == 1024) {
> > +    my $confmem = parse_memory($conf->{memory});
> > +    if ($confmem->{max}) {
> > +       $dimm_id = $MAX_SLOTS - 1;
> > +       $current_size = $confmem->{max};
> 
> Does this need to be $confmem->{max} + $static_size? See below for a
> description of the issue. Didn't think about it in detail, so please
> double check ;)

mmm, I wonder if I don't lower the number of slots, as "max" option
from config, is "static + x dimm slots", but in this case it should
depend of the number of sockets. 

> > +       $dimm_size = $confmem->{max} / $MAX_SLOTS;
> > +    } elsif ($conf->{hugepages} && $conf->{hugepages} == 1024) {
> >         $current_size = 8355840;
> >         $dimm_size = 131072;
> >      } else {
> 
> Nit: the loops below here are
>     for (my $j = 0; $j < 8; $j++) {
>         for (my $i = 0; $i < 32; $i++) {
> so it looks like potentially iterating more often than $MAX_SLOTS and
> reaching negative $dimm_ids. I know that we should always return from
> the loop earlier than that, but maybe it can be improved by
> extracting
> the inner part in a sub/closure and using different loops depending
> on
> how many slots there are? Same applies to foreach_dimm().
> 
> 
Yes, I think it should be better.

> Real issue: something is wrong with the calculation for unplugging in
> combination with 'max' (it uses the wrong dimm IDs):
> 
> 
I have verify myself, indeed , it's really buggy.

I'll work on it this weekend, thanks for the review !




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option
  2023-01-27 15:03     ` DERUMIER, Alexandre
@ 2023-01-30  8:03       ` Fiona Ebner
  2023-01-30  8:45         ` DERUMIER, Alexandre
  0 siblings, 1 reply; 30+ messages in thread
From: Fiona Ebner @ 2023-01-30  8:03 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel, aderumier

Am 27.01.23 um 16:03 schrieb DERUMIER, Alexandre:
>>
>>> +    # skip non hotpluggable value
>>> +    if (safe_num_ne($newmem->{max}, $oldmem->{max})) {
>>> +       die "skip\n";
>>> +    }
>>
>> Please move this to the call sites. The "die "skip""-logic should not
>> cross function boundaries.
>>
>>
> Just a note: This is exactly how it's done on nic && disk hotplug.
> 
> for example:
> vmconfig_update_disk {
>  ...
>     # skip non hotpluggable value
>    if (safe_string_ne($drive->{discard}, $old_drive->{discard}) ||
>         safe_string_ne($drive->{iothread}, $old_drive->{iothread}) ||
>         safe_string_ne($drive->{queues}, $old_drive->{queues}) ||
>         safe_string_ne($drive->{cache}, $old_drive->{cache}) ||
>         safe_string_ne($drive->{ssd}, $old_drive->{ssd}) ||
>         safe_string_ne($drive->{ro}, $old_drive->{ro})) {
>              die "skip\n";
>    }
> 
> 

Well, I'm not a fan of that either. At least that is in the same module.

To keep the logic inside the Memory.pm module, we could add a
can_hotplug function and then in QemuServer.pm do

die "skip\n" if !PVE::Memory::can_hotplug($old, $new);

Like that, it's cleanly separated.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max
  2023-01-27 15:15     ` DERUMIER, Alexandre
@ 2023-01-30  8:04       ` Fiona Ebner
  0 siblings, 0 replies; 30+ messages in thread
From: Fiona Ebner @ 2023-01-30  8:04 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel, aderumier

Am 27.01.23 um 16:15 schrieb DERUMIER, Alexandre:
> Le mardi 24 janvier 2023 à 14:05 +0100, Fiona Ebner a écrit :
>>> +my $check_memory_param = sub {
>>> +    my ($conf, $param) = @_;
>>> +
>>> +    my $mem = parse_memory($param->{memory});
>>> +    my $host_max_mem = get_host_max_mem($conf);
>>> +
>>> +    if ($mem->{max}) {
>>> +       die "memory max can't be bigger than supported cpu
>>> architecture $host_max_mem MB\n"
>>
>> s/MB/MiB
>>
>>> +           if $mem->{max} > $host_max_mem;
>>
>> Style nit: you could && both conditions to save lines
>>
>> This could be part of the verifier sub suggested in the last patch
> 
> I'm not sure we can use the verifier here,
> 
> as we need to retrieve $host_max_mem from the current configuration
> with get_host_max_mem($conf).

You're right, I missed that.

> 
> 
> This current $check_memory_param, in API2::Qemu , it's done in
> $updatefn, after the lock on the config.
> 
> 
> 




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option
  2023-01-30  8:03       ` Fiona Ebner
@ 2023-01-30  8:45         ` DERUMIER, Alexandre
  0 siblings, 0 replies; 30+ messages in thread
From: DERUMIER, Alexandre @ 2023-01-30  8:45 UTC (permalink / raw)
  To: pve-devel, aderumier, f.ebner

> > > 
> > Just a note: This is exactly how it's done on nic && disk hotplug.
> > 
> > for example:
> > vmconfig_update_disk {
> >  ...
> >     # skip non hotpluggable value
> >    if (safe_string_ne($drive->{discard}, $old_drive->{discard}) ||
> >         safe_string_ne($drive->{iothread}, $old_drive->{iothread})
> > ||
> >         safe_string_ne($drive->{queues}, $old_drive->{queues}) ||
> >         safe_string_ne($drive->{cache}, $old_drive->{cache}) ||
> >         safe_string_ne($drive->{ssd}, $old_drive->{ssd}) ||
> >         safe_string_ne($drive->{ro}, $old_drive->{ro})) {
> >              die "skip\n";
> >    }
> > 
> > 
> 
> Well, I'm not a fan of that either. At least that is in the same
> module.
> 
> To keep the logic inside the Memory.pm module, we could add a
> can_hotplug function and then in QemuServer.pm do
> 
> die "skip\n" if !PVE::Memory::can_hotplug($old, $new);
> 
> Like that, it's cleanly separated.
> 

Ok, no problem. I was going to do something like that.


I think I'll send a v3 tomorrow, I'm currently cleaning the virtio-mem
patches. 




^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-01-30  8:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04  6:42 [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Alexandre Derumier
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 1/9] test: add memory tests Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 2/9] add memory parser Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 3/9] memory: add get_static_mem Alexandre Derumier
2023-01-24 13:04   ` Fiona Ebner
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 4/9] config: memory: add 'max' option Alexandre Derumier
2023-01-24 13:05   ` Fiona Ebner
2023-01-27 15:03     ` DERUMIER, Alexandre
2023-01-30  8:03       ` Fiona Ebner
2023-01-30  8:45         ` DERUMIER, Alexandre
2023-01-04  6:42 ` [pve-devel] [PATCH v2 qemu-server 5/9] memory: get_max_mem: use config memory max Alexandre Derumier
2023-01-24 13:05   ` Fiona Ebner
2023-01-27 15:15     ` DERUMIER, Alexandre
2023-01-30  8:04       ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 6/9] memory: use 64 slots && static dimm size when max is defined Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-27 15:52     ` DERUMIER, Alexandre
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 7/9] test: add memory-max tests Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support Alexandre Derumier
2023-01-24 13:06   ` Fiona Ebner
2023-01-25  9:00     ` DERUMIER, Alexandre
2023-01-25  9:54       ` Fiona Ebner
2023-01-25 10:28         ` DERUMIER, Alexandre
2023-01-25 10:52           ` Fiona Ebner
2023-01-04  6:43 ` [pve-devel] [PATCH v2 qemu-server 9/9] tests: add virtio-mem tests Alexandre Derumier
2023-01-24 13:08   ` Fiona Ebner
2023-01-24 13:08 ` [pve-devel] [PATCH v2 qemu-server 0/9] rework memory hotplug + virtiomem Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal