all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage
@ 2025-11-28 10:36 Fabian Grünbichler
  0 siblings, 0 replies; only message in thread
From: Fabian Grünbichler @ 2025-11-28 10:36 UTC (permalink / raw)
  To: pve-devel

after a certain amount of KSM sharing, PSS lookups become prohibitively
expensive. instead of reverting to the old broken method, simply use the
cgroup's memory usage as `memhost` value.

this does not account for merged pages because of KSM anymore.

I benchmarked this with 4 VMs running with different levels of KSM sharing. in
the output below, "merged pages" refers to the contents of
/proc/$pid/ksm_merging_pages, the extract_* benchmark runs refer to four
different variants of extracting memory usage, with the actual extraction part
running 1000x in a loop for each run to amortize perl/process setup costs,
qm_status_stock is `qm status $vmid --verbose`, and qm_status_pateched is `perl
-I./src/PVE ./src/bin/qm status $vmid --verbose` with this patch applied.

the variants:
- extract_pss: status before this patch, query smaps_rollup for each process
  that is part of the qemu.slice of the VM
- extract_rss: extract VmRSS from the `/proc/$pid/status` file of the main
  process
- extract_rss_cgroup: like _rss, but for each process of the slice
- extract_cgroup: use PVE::QemuServer::CGroup get_memory_stat (this patch)

first, with no KSM active

VMID: 113

pss:        724971520
rss:        733282304
cgroup:     727617536
rss_cgroup: 733282304

Benchmark 1: extract_pss        271.2 ms ±   6.0 ms    [User: 226.3 ms, System: 44.7 ms]
Benchmark 2: extract_rss        267.8 ms ±   3.6 ms    [User: 223.9 ms, System: 43.7 ms]
Benchmark 3: extract_cgroup     273.5 ms ±   6.2 ms    [User: 227.2 ms, System: 46.2 ms]
Benchmark 4: extract_rss_cgroup 270.5 ms ±   3.7 ms    [User: 225.0 ms, System: 45.3 ms]

both reported usage and runtime in the same ballpark

VMID: 838383 (with 48G of memory):

pss:        40561564672
rss:        40566108160
cgroup:     40961339392
rss-cgroup: 40572141568

usage in the same ballpark

Benchmark 1: extract_pss        732.0 ms ±   4.4 ms    [User: 224.8 ms, System: 506.8 ms]
Benchmark 2: extract_rss        272.1 ms ±   5.2 ms    [User: 227.8 ms, System: 44.0 ms]
Benchmark 3: extract_cgroup     274.2 ms ±   2.2 ms    [User: 227.8 ms, System: 46.2 ms]
Benchmark 4: extract_rss_cgroup 270.9 ms ±   3.9 ms    [User: 224.9 ms, System: 45.8 ms]

but PSS already a lot slower..

Benchmark 1: qm_status_stock   820.9 ms ±   7.5 ms    [User: 293.1 ms, System: 523.3 ms]
Benchmark 2: qm_status_patched 356.2 ms ±   5.6 ms    [User: 290.2 ms, System: 61.5 ms]

which is also visible in the before and after

the other two VMs behaved as 113

and now with KSM active

VMID: 113
merged pages: 10747 (very little)

pss:        559815680
rss:        594853888
cgroup:     568197120
rss-cgroup: 594853888

Benchmark 1: extract_pss        280.0 ms ±   2.4 ms    [User: 229.5 ms, System: 50.2 ms]
Benchmark 2: extract_rss        274.8 ms ±   3.7 ms    [User: 225.9 ms, System: 48.7 ms]
Benchmark 3: extract_cgroup     279.0 ms ±   4.6 ms    [User: 228.0 ms, System: 50.7 ms]
Benchmark 4: extract_rss_cgroup 274.7 ms ±   6.7 ms    [User: 228.0 ms, System: 46.4 ms]

still same ball park

VMID: 838383 (with 48G of memory)
merged pages: 6696434 (a lot - this is 25G worth of pages!)

pss:        12411169792
rss:        38772117504
cgroup:     12799062016
rss-cgroup: 38778150912

RSS based are roughly the same, but cgroup gives us almost the same numbers as
PSS despite KSM being active!

Benchmark 1: extract_pss        691.7 ms ±   3.4 ms    [User: 225.5 ms, System: 465.8 ms]
Benchmark 2: extract_rss        276.3 ms ±   7.1 ms    [User: 227.4 ms, System: 48.6 ms]
Benchmark 3: extract_cgroup     277.8 ms ±   4.4 ms    [User: 228.5 ms, System: 49.1 ms]
Benchmark 4: extract_rss_cgroup 274.7 ms ±   3.5 ms    [User: 226.6 ms, System: 47.8 ms]

but it is still fast!

Benchmark 1: qm_status_stock   771.8 ms ±   7.2 ms    [User: 296.0 ms, System: 471.0 ms]
Benchmark 2: qm_status_patched 360.2 ms ±   5.1 ms    [User: 287.1 ms, System: 68.5 ms]

confirmed by `qm status` as well

VMID: 838384
merged pages: 165540 (little, this is about 645MB worth of pages)

pss:        2522527744
rss:        2927058944
cgroup:     2500329472
rss-cgroup: 2932944896

Benchmark 1: extract_pss        318.4 ms ±   3.6 ms    [User: 227.3 ms, System: 90.8 ms]
Benchmark 2: extract_rss        273.9 ms ±   5.8 ms    [User: 226.5 ms, System: 47.2 ms]
Benchmark 3: extract_cgroup     276.3 ms ±   4.1 ms    [User: 225.4 ms, System: 50.7 ms]
Benchmark 4: extract_rss_cgroup 276.5 ms ±   8.6 ms    [User: 226.1 ms, System: 50.1 ms]

Benchmark 1: qm_status_stock   400.2 ms ±   6.6 ms    [User: 292.1 ms, System: 103.5 ms]
Benchmark 2: qm_status_patched 357.0 ms ±   4.1 ms    [User: 288.7 ms, System: 63.7 ms]

results match those of 838383, just with less effect

the fourth VM matches this as well.

Fixes/Reverts: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    given the numbers, going with the CGroup-based approach seems best - it gives
    us accurate numbers without the slowdown, and gives users an insight into how
    KSM affects their guests host memory usage without flip-flopping.

 src/PVE/QemuServer.pm | 35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index a7fbec14..62d835a5 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -2324,35 +2324,6 @@ sub vzlist {
     return $vzlist;
 }
 
-# Iterate over all PIDs inside a VMID's cgroup slice and accumulate their PSS (proportional set
-# size) to get a relatively telling effective memory usage of all processes involved with a VM.
-my sub get_vmid_total_cgroup_memory_usage {
-    my ($vmid) = @_;
-
-    my $memory_usage = 0;
-    if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
-        while (my $pid = <$procs_fh>) {
-            chomp($pid);
-
-            open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
-                or $!{ENOENT}
-                or die "failed to open PSS memory-stat from process - $!\n";
-            next if !defined($smaps_fh);
-
-            while (my $line = <$smaps_fh>) {
-                if ($line =~ m/^Pss:\s+([0-9]+) kB$/) {
-                    $memory_usage += int($1) * 1024;
-                    last; # end inner while loop, go to next $pid
-                }
-            }
-            close $smaps_fh;
-        }
-        close($procs_fh);
-    }
-
-    return $memory_usage;
-}
-
 our $vmstatus_return_properties = {
     vmid => get_standard_option('pve-vmid'),
     status => {
@@ -2614,9 +2585,11 @@ sub vmstatus {
 
         $d->{uptime} = int(($uptime - $pstat->{starttime}) / $cpuinfo->{user_hz});
 
-        $d->{memhost} = get_vmid_total_cgroup_memory_usage($vmid);
+        my $cgroup = PVE::QemuServer::CGroup->new($vmid);
+        my $cgroup_mem = $cgroup->get_memory_stat();
+        $d->{memhost} = $cgroup_mem->{mem} // 0;
 
-        $d->{mem} = $d->{memhost}; # default to cgroup PSS sum, balloon info can override this below
+        $d->{mem} = $d->{memhost}; # default to cgroup, balloon info can override this below
 
         my $pressures = PVE::ProcFSTools::read_cgroup_pressure("qemu.slice/${vmid}.scope");
         $d->{pressurecpusome} = $pressures->{cpu}->{some}->{avg10} * 1;
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-11-28 10:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-28 10:36 [pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage Fabian Grünbichler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal