From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage
Date: Fri, 28 Nov 2025 11:36:34 +0100 [thread overview]
Message-ID: <20251128103639.446372-1-f.gruenbichler@proxmox.com> (raw)
after a certain amount of KSM sharing, PSS lookups become prohibitively
expensive. instead of reverting to the old broken method, simply use the
cgroup's memory usage as `memhost` value.
this does not account for merged pages because of KSM anymore.
I benchmarked this with 4 VMs running with different levels of KSM sharing. in
the output below, "merged pages" refers to the contents of
/proc/$pid/ksm_merging_pages, the extract_* benchmark runs refer to four
different variants of extracting memory usage, with the actual extraction part
running 1000x in a loop for each run to amortize perl/process setup costs,
qm_status_stock is `qm status $vmid --verbose`, and qm_status_pateched is `perl
-I./src/PVE ./src/bin/qm status $vmid --verbose` with this patch applied.
the variants:
- extract_pss: status before this patch, query smaps_rollup for each process
that is part of the qemu.slice of the VM
- extract_rss: extract VmRSS from the `/proc/$pid/status` file of the main
process
- extract_rss_cgroup: like _rss, but for each process of the slice
- extract_cgroup: use PVE::QemuServer::CGroup get_memory_stat (this patch)
first, with no KSM active
VMID: 113
pss: 724971520
rss: 733282304
cgroup: 727617536
rss_cgroup: 733282304
Benchmark 1: extract_pss 271.2 ms ± 6.0 ms [User: 226.3 ms, System: 44.7 ms]
Benchmark 2: extract_rss 267.8 ms ± 3.6 ms [User: 223.9 ms, System: 43.7 ms]
Benchmark 3: extract_cgroup 273.5 ms ± 6.2 ms [User: 227.2 ms, System: 46.2 ms]
Benchmark 4: extract_rss_cgroup 270.5 ms ± 3.7 ms [User: 225.0 ms, System: 45.3 ms]
both reported usage and runtime in the same ballpark
VMID: 838383 (with 48G of memory):
pss: 40561564672
rss: 40566108160
cgroup: 40961339392
rss-cgroup: 40572141568
usage in the same ballpark
Benchmark 1: extract_pss 732.0 ms ± 4.4 ms [User: 224.8 ms, System: 506.8 ms]
Benchmark 2: extract_rss 272.1 ms ± 5.2 ms [User: 227.8 ms, System: 44.0 ms]
Benchmark 3: extract_cgroup 274.2 ms ± 2.2 ms [User: 227.8 ms, System: 46.2 ms]
Benchmark 4: extract_rss_cgroup 270.9 ms ± 3.9 ms [User: 224.9 ms, System: 45.8 ms]
but PSS already a lot slower..
Benchmark 1: qm_status_stock 820.9 ms ± 7.5 ms [User: 293.1 ms, System: 523.3 ms]
Benchmark 2: qm_status_patched 356.2 ms ± 5.6 ms [User: 290.2 ms, System: 61.5 ms]
which is also visible in the before and after
the other two VMs behaved as 113
and now with KSM active
VMID: 113
merged pages: 10747 (very little)
pss: 559815680
rss: 594853888
cgroup: 568197120
rss-cgroup: 594853888
Benchmark 1: extract_pss 280.0 ms ± 2.4 ms [User: 229.5 ms, System: 50.2 ms]
Benchmark 2: extract_rss 274.8 ms ± 3.7 ms [User: 225.9 ms, System: 48.7 ms]
Benchmark 3: extract_cgroup 279.0 ms ± 4.6 ms [User: 228.0 ms, System: 50.7 ms]
Benchmark 4: extract_rss_cgroup 274.7 ms ± 6.7 ms [User: 228.0 ms, System: 46.4 ms]
still same ball park
VMID: 838383 (with 48G of memory)
merged pages: 6696434 (a lot - this is 25G worth of pages!)
pss: 12411169792
rss: 38772117504
cgroup: 12799062016
rss-cgroup: 38778150912
RSS based are roughly the same, but cgroup gives us almost the same numbers as
PSS despite KSM being active!
Benchmark 1: extract_pss 691.7 ms ± 3.4 ms [User: 225.5 ms, System: 465.8 ms]
Benchmark 2: extract_rss 276.3 ms ± 7.1 ms [User: 227.4 ms, System: 48.6 ms]
Benchmark 3: extract_cgroup 277.8 ms ± 4.4 ms [User: 228.5 ms, System: 49.1 ms]
Benchmark 4: extract_rss_cgroup 274.7 ms ± 3.5 ms [User: 226.6 ms, System: 47.8 ms]
but it is still fast!
Benchmark 1: qm_status_stock 771.8 ms ± 7.2 ms [User: 296.0 ms, System: 471.0 ms]
Benchmark 2: qm_status_patched 360.2 ms ± 5.1 ms [User: 287.1 ms, System: 68.5 ms]
confirmed by `qm status` as well
VMID: 838384
merged pages: 165540 (little, this is about 645MB worth of pages)
pss: 2522527744
rss: 2927058944
cgroup: 2500329472
rss-cgroup: 2932944896
Benchmark 1: extract_pss 318.4 ms ± 3.6 ms [User: 227.3 ms, System: 90.8 ms]
Benchmark 2: extract_rss 273.9 ms ± 5.8 ms [User: 226.5 ms, System: 47.2 ms]
Benchmark 3: extract_cgroup 276.3 ms ± 4.1 ms [User: 225.4 ms, System: 50.7 ms]
Benchmark 4: extract_rss_cgroup 276.5 ms ± 8.6 ms [User: 226.1 ms, System: 50.1 ms]
Benchmark 1: qm_status_stock 400.2 ms ± 6.6 ms [User: 292.1 ms, System: 103.5 ms]
Benchmark 2: qm_status_patched 357.0 ms ± 4.1 ms [User: 288.7 ms, System: 63.7 ms]
results match those of 838383, just with less effect
the fourth VM matches this as well.
Fixes/Reverts: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
Notes:
given the numbers, going with the CGroup-based approach seems best - it gives
us accurate numbers without the slowdown, and gives users an insight into how
KSM affects their guests host memory usage without flip-flopping.
src/PVE/QemuServer.pm | 35 ++++-------------------------------
1 file changed, 4 insertions(+), 31 deletions(-)
diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
index a7fbec14..62d835a5 100644
--- a/src/PVE/QemuServer.pm
+++ b/src/PVE/QemuServer.pm
@@ -2324,35 +2324,6 @@ sub vzlist {
return $vzlist;
}
-# Iterate over all PIDs inside a VMID's cgroup slice and accumulate their PSS (proportional set
-# size) to get a relatively telling effective memory usage of all processes involved with a VM.
-my sub get_vmid_total_cgroup_memory_usage {
- my ($vmid) = @_;
-
- my $memory_usage = 0;
- if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
- while (my $pid = <$procs_fh>) {
- chomp($pid);
-
- open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
- or $!{ENOENT}
- or die "failed to open PSS memory-stat from process - $!\n";
- next if !defined($smaps_fh);
-
- while (my $line = <$smaps_fh>) {
- if ($line =~ m/^Pss:\s+([0-9]+) kB$/) {
- $memory_usage += int($1) * 1024;
- last; # end inner while loop, go to next $pid
- }
- }
- close $smaps_fh;
- }
- close($procs_fh);
- }
-
- return $memory_usage;
-}
-
our $vmstatus_return_properties = {
vmid => get_standard_option('pve-vmid'),
status => {
@@ -2614,9 +2585,11 @@ sub vmstatus {
$d->{uptime} = int(($uptime - $pstat->{starttime}) / $cpuinfo->{user_hz});
- $d->{memhost} = get_vmid_total_cgroup_memory_usage($vmid);
+ my $cgroup = PVE::QemuServer::CGroup->new($vmid);
+ my $cgroup_mem = $cgroup->get_memory_stat();
+ $d->{memhost} = $cgroup_mem->{mem} // 0;
- $d->{mem} = $d->{memhost}; # default to cgroup PSS sum, balloon info can override this below
+ $d->{mem} = $d->{memhost}; # default to cgroup, balloon info can override this below
my $pressures = PVE::ProcFSTools::read_cgroup_pressure("qemu.slice/${vmid}.scope");
$d->{pressurecpusome} = $pressures->{cpu}->{some}->{avg10} * 1;
--
2.47.3
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
reply other threads:[~2025-11-28 10:36 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251128103639.446372-1-f.gruenbichler@proxmox.com \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox