From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage
Date: Wed, 17 Dec 2025 10:36:34 +0100 [thread overview]
Message-ID: <1765964137.srayifogt6.astroid@yuna.none> (raw)
In-Reply-To: <20251128103639.446372-1-f.gruenbichler@proxmox.com>
ping - we get semi-regular reports of people running into this since
upgrading..
On November 28, 2025 11:36 am, Fabian Grünbichler wrote:
> after a certain amount of KSM sharing, PSS lookups become prohibitively
> expensive. instead of reverting to the old broken method, simply use the
> cgroup's memory usage as `memhost` value.
>
> this does not account for merged pages because of KSM anymore.
>
> I benchmarked this with 4 VMs running with different levels of KSM sharing. in
> the output below, "merged pages" refers to the contents of
> /proc/$pid/ksm_merging_pages, the extract_* benchmark runs refer to four
> different variants of extracting memory usage, with the actual extraction part
> running 1000x in a loop for each run to amortize perl/process setup costs,
> qm_status_stock is `qm status $vmid --verbose`, and qm_status_pateched is `perl
> -I./src/PVE ./src/bin/qm status $vmid --verbose` with this patch applied.
>
> the variants:
> - extract_pss: status before this patch, query smaps_rollup for each process
> that is part of the qemu.slice of the VM
> - extract_rss: extract VmRSS from the `/proc/$pid/status` file of the main
> process
> - extract_rss_cgroup: like _rss, but for each process of the slice
> - extract_cgroup: use PVE::QemuServer::CGroup get_memory_stat (this patch)
>
> first, with no KSM active
>
> VMID: 113
>
> pss: 724971520
> rss: 733282304
> cgroup: 727617536
> rss_cgroup: 733282304
>
> Benchmark 1: extract_pss 271.2 ms ± 6.0 ms [User: 226.3 ms, System: 44.7 ms]
> Benchmark 2: extract_rss 267.8 ms ± 3.6 ms [User: 223.9 ms, System: 43.7 ms]
> Benchmark 3: extract_cgroup 273.5 ms ± 6.2 ms [User: 227.2 ms, System: 46.2 ms]
> Benchmark 4: extract_rss_cgroup 270.5 ms ± 3.7 ms [User: 225.0 ms, System: 45.3 ms]
>
> both reported usage and runtime in the same ballpark
>
> VMID: 838383 (with 48G of memory):
>
> pss: 40561564672
> rss: 40566108160
> cgroup: 40961339392
> rss-cgroup: 40572141568
>
> usage in the same ballpark
>
> Benchmark 1: extract_pss 732.0 ms ± 4.4 ms [User: 224.8 ms, System: 506.8 ms]
> Benchmark 2: extract_rss 272.1 ms ± 5.2 ms [User: 227.8 ms, System: 44.0 ms]
> Benchmark 3: extract_cgroup 274.2 ms ± 2.2 ms [User: 227.8 ms, System: 46.2 ms]
> Benchmark 4: extract_rss_cgroup 270.9 ms ± 3.9 ms [User: 224.9 ms, System: 45.8 ms]
>
> but PSS already a lot slower..
>
> Benchmark 1: qm_status_stock 820.9 ms ± 7.5 ms [User: 293.1 ms, System: 523.3 ms]
> Benchmark 2: qm_status_patched 356.2 ms ± 5.6 ms [User: 290.2 ms, System: 61.5 ms]
>
> which is also visible in the before and after
>
> the other two VMs behaved as 113
>
> and now with KSM active
>
> VMID: 113
> merged pages: 10747 (very little)
>
> pss: 559815680
> rss: 594853888
> cgroup: 568197120
> rss-cgroup: 594853888
>
> Benchmark 1: extract_pss 280.0 ms ± 2.4 ms [User: 229.5 ms, System: 50.2 ms]
> Benchmark 2: extract_rss 274.8 ms ± 3.7 ms [User: 225.9 ms, System: 48.7 ms]
> Benchmark 3: extract_cgroup 279.0 ms ± 4.6 ms [User: 228.0 ms, System: 50.7 ms]
> Benchmark 4: extract_rss_cgroup 274.7 ms ± 6.7 ms [User: 228.0 ms, System: 46.4 ms]
>
> still same ball park
>
> VMID: 838383 (with 48G of memory)
> merged pages: 6696434 (a lot - this is 25G worth of pages!)
>
> pss: 12411169792
> rss: 38772117504
> cgroup: 12799062016
> rss-cgroup: 38778150912
>
> RSS based are roughly the same, but cgroup gives us almost the same numbers as
> PSS despite KSM being active!
>
> Benchmark 1: extract_pss 691.7 ms ± 3.4 ms [User: 225.5 ms, System: 465.8 ms]
> Benchmark 2: extract_rss 276.3 ms ± 7.1 ms [User: 227.4 ms, System: 48.6 ms]
> Benchmark 3: extract_cgroup 277.8 ms ± 4.4 ms [User: 228.5 ms, System: 49.1 ms]
> Benchmark 4: extract_rss_cgroup 274.7 ms ± 3.5 ms [User: 226.6 ms, System: 47.8 ms]
>
> but it is still fast!
>
> Benchmark 1: qm_status_stock 771.8 ms ± 7.2 ms [User: 296.0 ms, System: 471.0 ms]
> Benchmark 2: qm_status_patched 360.2 ms ± 5.1 ms [User: 287.1 ms, System: 68.5 ms]
>
> confirmed by `qm status` as well
>
> VMID: 838384
> merged pages: 165540 (little, this is about 645MB worth of pages)
>
> pss: 2522527744
> rss: 2927058944
> cgroup: 2500329472
> rss-cgroup: 2932944896
>
> Benchmark 1: extract_pss 318.4 ms ± 3.6 ms [User: 227.3 ms, System: 90.8 ms]
> Benchmark 2: extract_rss 273.9 ms ± 5.8 ms [User: 226.5 ms, System: 47.2 ms]
> Benchmark 3: extract_cgroup 276.3 ms ± 4.1 ms [User: 225.4 ms, System: 50.7 ms]
> Benchmark 4: extract_rss_cgroup 276.5 ms ± 8.6 ms [User: 226.1 ms, System: 50.1 ms]
>
> Benchmark 1: qm_status_stock 400.2 ms ± 6.6 ms [User: 292.1 ms, System: 103.5 ms]
> Benchmark 2: qm_status_patched 357.0 ms ± 4.1 ms [User: 288.7 ms, System: 63.7 ms]
>
> results match those of 838383, just with less effect
>
> the fourth VM matches this as well.
>
> Fixes/Reverts: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>
> Notes:
> given the numbers, going with the CGroup-based approach seems best - it gives
> us accurate numbers without the slowdown, and gives users an insight into how
> KSM affects their guests host memory usage without flip-flopping.
>
> src/PVE/QemuServer.pm | 35 ++++-------------------------------
> 1 file changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> index a7fbec14..62d835a5 100644
> --- a/src/PVE/QemuServer.pm
> +++ b/src/PVE/QemuServer.pm
> @@ -2324,35 +2324,6 @@ sub vzlist {
> return $vzlist;
> }
>
> -# Iterate over all PIDs inside a VMID's cgroup slice and accumulate their PSS (proportional set
> -# size) to get a relatively telling effective memory usage of all processes involved with a VM.
> -my sub get_vmid_total_cgroup_memory_usage {
> - my ($vmid) = @_;
> -
> - my $memory_usage = 0;
> - if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
> - while (my $pid = <$procs_fh>) {
> - chomp($pid);
> -
> - open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
> - or $!{ENOENT}
> - or die "failed to open PSS memory-stat from process - $!\n";
> - next if !defined($smaps_fh);
> -
> - while (my $line = <$smaps_fh>) {
> - if ($line =~ m/^Pss:\s+([0-9]+) kB$/) {
> - $memory_usage += int($1) * 1024;
> - last; # end inner while loop, go to next $pid
> - }
> - }
> - close $smaps_fh;
> - }
> - close($procs_fh);
> - }
> -
> - return $memory_usage;
> -}
> -
> our $vmstatus_return_properties = {
> vmid => get_standard_option('pve-vmid'),
> status => {
> @@ -2614,9 +2585,11 @@ sub vmstatus {
>
> $d->{uptime} = int(($uptime - $pstat->{starttime}) / $cpuinfo->{user_hz});
>
> - $d->{memhost} = get_vmid_total_cgroup_memory_usage($vmid);
> + my $cgroup = PVE::QemuServer::CGroup->new($vmid);
> + my $cgroup_mem = $cgroup->get_memory_stat();
> + $d->{memhost} = $cgroup_mem->{mem} // 0;
>
> - $d->{mem} = $d->{memhost}; # default to cgroup PSS sum, balloon info can override this below
> + $d->{mem} = $d->{memhost}; # default to cgroup, balloon info can override this below
>
> my $pressures = PVE::ProcFSTools::read_cgroup_pressure("qemu.slice/${vmid}.scope");
> $d->{pressurecpusome} = $pressures->{cpu}->{some}->{avg10} * 1;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
prev parent reply other threads:[~2025-12-17 9:36 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 10:36 Fabian Grünbichler
2025-12-17 9:36 ` Fabian Grünbichler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1765964137.srayifogt6.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.