From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage
Date: Wed, 17 Dec 2025 10:36:34 +0100 [thread overview]
Message-ID: <1765964137.srayifogt6.astroid@yuna.none> (raw)
In-Reply-To: <20251128103639.446372-1-f.gruenbichler@proxmox.com>
ping - we get semi-regular reports of people running into this since
upgrading..
On November 28, 2025 11:36 am, Fabian Grünbichler wrote:
> after a certain amount of KSM sharing, PSS lookups become prohibitively
> expensive. instead of reverting to the old broken method, simply use the
> cgroup's memory usage as `memhost` value.
>
> this does not account for merged pages because of KSM anymore.
>
> I benchmarked this with 4 VMs running with different levels of KSM sharing. in
> the output below, "merged pages" refers to the contents of
> /proc/$pid/ksm_merging_pages, the extract_* benchmark runs refer to four
> different variants of extracting memory usage, with the actual extraction part
> running 1000x in a loop for each run to amortize perl/process setup costs,
> qm_status_stock is `qm status $vmid --verbose`, and qm_status_pateched is `perl
> -I./src/PVE ./src/bin/qm status $vmid --verbose` with this patch applied.
>
> the variants:
> - extract_pss: status before this patch, query smaps_rollup for each process
> that is part of the qemu.slice of the VM
> - extract_rss: extract VmRSS from the `/proc/$pid/status` file of the main
> process
> - extract_rss_cgroup: like _rss, but for each process of the slice
> - extract_cgroup: use PVE::QemuServer::CGroup get_memory_stat (this patch)
>
> first, with no KSM active
>
> VMID: 113
>
> pss: 724971520
> rss: 733282304
> cgroup: 727617536
> rss_cgroup: 733282304
>
> Benchmark 1: extract_pss 271.2 ms ± 6.0 ms [User: 226.3 ms, System: 44.7 ms]
> Benchmark 2: extract_rss 267.8 ms ± 3.6 ms [User: 223.9 ms, System: 43.7 ms]
> Benchmark 3: extract_cgroup 273.5 ms ± 6.2 ms [User: 227.2 ms, System: 46.2 ms]
> Benchmark 4: extract_rss_cgroup 270.5 ms ± 3.7 ms [User: 225.0 ms, System: 45.3 ms]
>
> both reported usage and runtime in the same ballpark
>
> VMID: 838383 (with 48G of memory):
>
> pss: 40561564672
> rss: 40566108160
> cgroup: 40961339392
> rss-cgroup: 40572141568
>
> usage in the same ballpark
>
> Benchmark 1: extract_pss 732.0 ms ± 4.4 ms [User: 224.8 ms, System: 506.8 ms]
> Benchmark 2: extract_rss 272.1 ms ± 5.2 ms [User: 227.8 ms, System: 44.0 ms]
> Benchmark 3: extract_cgroup 274.2 ms ± 2.2 ms [User: 227.8 ms, System: 46.2 ms]
> Benchmark 4: extract_rss_cgroup 270.9 ms ± 3.9 ms [User: 224.9 ms, System: 45.8 ms]
>
> but PSS already a lot slower..
>
> Benchmark 1: qm_status_stock 820.9 ms ± 7.5 ms [User: 293.1 ms, System: 523.3 ms]
> Benchmark 2: qm_status_patched 356.2 ms ± 5.6 ms [User: 290.2 ms, System: 61.5 ms]
>
> which is also visible in the before and after
>
> the other two VMs behaved as 113
>
> and now with KSM active
>
> VMID: 113
> merged pages: 10747 (very little)
>
> pss: 559815680
> rss: 594853888
> cgroup: 568197120
> rss-cgroup: 594853888
>
> Benchmark 1: extract_pss 280.0 ms ± 2.4 ms [User: 229.5 ms, System: 50.2 ms]
> Benchmark 2: extract_rss 274.8 ms ± 3.7 ms [User: 225.9 ms, System: 48.7 ms]
> Benchmark 3: extract_cgroup 279.0 ms ± 4.6 ms [User: 228.0 ms, System: 50.7 ms]
> Benchmark 4: extract_rss_cgroup 274.7 ms ± 6.7 ms [User: 228.0 ms, System: 46.4 ms]
>
> still same ball park
>
> VMID: 838383 (with 48G of memory)
> merged pages: 6696434 (a lot - this is 25G worth of pages!)
>
> pss: 12411169792
> rss: 38772117504
> cgroup: 12799062016
> rss-cgroup: 38778150912
>
> RSS based are roughly the same, but cgroup gives us almost the same numbers as
> PSS despite KSM being active!
>
> Benchmark 1: extract_pss 691.7 ms ± 3.4 ms [User: 225.5 ms, System: 465.8 ms]
> Benchmark 2: extract_rss 276.3 ms ± 7.1 ms [User: 227.4 ms, System: 48.6 ms]
> Benchmark 3: extract_cgroup 277.8 ms ± 4.4 ms [User: 228.5 ms, System: 49.1 ms]
> Benchmark 4: extract_rss_cgroup 274.7 ms ± 3.5 ms [User: 226.6 ms, System: 47.8 ms]
>
> but it is still fast!
>
> Benchmark 1: qm_status_stock 771.8 ms ± 7.2 ms [User: 296.0 ms, System: 471.0 ms]
> Benchmark 2: qm_status_patched 360.2 ms ± 5.1 ms [User: 287.1 ms, System: 68.5 ms]
>
> confirmed by `qm status` as well
>
> VMID: 838384
> merged pages: 165540 (little, this is about 645MB worth of pages)
>
> pss: 2522527744
> rss: 2927058944
> cgroup: 2500329472
> rss-cgroup: 2932944896
>
> Benchmark 1: extract_pss 318.4 ms ± 3.6 ms [User: 227.3 ms, System: 90.8 ms]
> Benchmark 2: extract_rss 273.9 ms ± 5.8 ms [User: 226.5 ms, System: 47.2 ms]
> Benchmark 3: extract_cgroup 276.3 ms ± 4.1 ms [User: 225.4 ms, System: 50.7 ms]
> Benchmark 4: extract_rss_cgroup 276.5 ms ± 8.6 ms [User: 226.1 ms, System: 50.1 ms]
>
> Benchmark 1: qm_status_stock 400.2 ms ± 6.6 ms [User: 292.1 ms, System: 103.5 ms]
> Benchmark 2: qm_status_patched 357.0 ms ± 4.1 ms [User: 288.7 ms, System: 63.7 ms]
>
> results match those of 838383, just with less effect
>
> the fourth VM matches this as well.
>
> Fixes/Reverts: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>
> Notes:
> given the numbers, going with the CGroup-based approach seems best - it gives
> us accurate numbers without the slowdown, and gives users an insight into how
> KSM affects their guests host memory usage without flip-flopping.
>
> src/PVE/QemuServer.pm | 35 ++++-------------------------------
> 1 file changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index a7fbec14..62d835a5 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -2324,35 +2324,6 @@ sub vzlist {
> return $vzlist;
> }
>
> -# Iterate over all PIDs inside a VMID's cgroup slice and accumulate their PSS (proportional set
> -# size) to get a relatively telling effective memory usage of all processes involved with a VM.
> -my sub get_vmid_total_cgroup_memory_usage {
> - my ($vmid) = @_;
> -
> - my $memory_usage = 0;
> - if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
> - while (my $pid = <$procs_fh>) {
> - chomp($pid);
> -
> - open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
> - or $!{ENOENT}
> - or die "failed to open PSS memory-stat from process - $!\n";
> - next if !defined($smaps_fh);
> -
> - while (my $line = <$smaps_fh>) {
> - if ($line =~ m/^Pss:\s+([0-9]+) kB$/) {
> - $memory_usage += int($1) * 1024;
> - last; # end inner while loop, go to next $pid
> - }
> - }
> - close $smaps_fh;
> - }
> - close($procs_fh);
> - }
> -
> - return $memory_usage;
> -}
> -
> our $vmstatus_return_properties = {
> vmid => get_standard_option('pve-vmid'),
> status => {
> @@ -2614,9 +2585,11 @@ sub vmstatus {
>
> $d->{uptime} = int(($uptime - $pstat->{starttime}) / $cpuinfo->{user_hz});
>
> - $d->{memhost} = get_vmid_total_cgroup_memory_usage($vmid);
> + my $cgroup = PVE::QemuServer::CGroup->new($vmid);
> + my $cgroup_mem = $cgroup->get_memory_stat();
> + $d->{memhost} = $cgroup_mem->{mem} // 0;
>
> - $d->{mem} = $d->{memhost}; # default to cgroup PSS sum, balloon info can override this below
> + $d->{mem} = $d->{memhost}; # default to cgroup, balloon info can override this below
>
> my $pressures = PVE::ProcFSTools::read_cgroup_pressure("qemu.slice/${vmid}.scope");
> $d->{pressurecpusome} = $pressures->{cpu}->{some}->{avg10} * 1;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-12-17 9:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 10:36 Fabian Grünbichler
2025-12-17 9:36 ` Fabian Grünbichler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1765964137.srayifogt6.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox