public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pve-devel] [RFC qemu-server] fix #6935: vmstatus: fallback to RSS in case of KSM usage
Date: Tue, 25 Nov 2025 15:20:52 +0100	[thread overview]
Message-ID: <1764080351.cnbokh9rxb.astroid@yuna.none> (raw)
In-Reply-To: <7fe6a9b5-3950-4195-9aaf-8112fcf68aa4@proxmox.com>

On November 25, 2025 3:08 pm, Thomas Lamprecht wrote:
> Am 25.11.25 um 14:51 schrieb Fabian Grünbichler:
>> after a certain amount of KSM sharing, PSS lookups become prohibitively
>> expensive. fallback to RSS (which was used before) in that case, to avoid
>> vmstatus calls blocking for long periods of time.
>> 
>> I benchmarked this with 3 VMs running with different levels of KSM sharing. in
>> the output below, "merged pages" refers to the contents of
>> /proc/$pid/ksm_merging_pages, extract_pss is the parsing code for cumulative
>> PSS of a VM cgroup isolated, extract_rss is the parsing code for cumulative RSS
>> of a VM cgroup isolated, qm_status_stock is `qm status $vmid --verbose`, and
>> qm_status_patched is `perl -I./src/PVE ./src/bin/qm status $vmid --verbose`
>> with this patch applied.
>>
>> [..]
>> 
>> Fixes: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")
>> 
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> ---
>> 
>> Notes:
>>     the threshold is a bit arbitrary, we could also consider setting it
>>     lower to be on the safe side, or make it relative to the total
>>     number of pages of memory..
>>     
>>     one issue with this approach is that if KSM is disabled later on and
>>     all the merging is undone, the problematic behaviour remains, and
>>     there is - AFAICT - no trace of this state in `ksm_stat` of the
>>     process or elsewhere. the behaviour goes away if the VM is stopped
>>     and started again. instead of doing a per-pid decision, we might
>>     want to opt for setting a global RSS fallback in case KSM is
>>     detected as active on the host?
> 
> One can now also disable KSM per VM, so that config property should be
> checked too if we go that route.

right, if the running config has that set, we should be allowed to query
PSS.. that actually might be the nicest solution?

>>     we should of course also investigate further whether this is fixable
>>     or improvable on the kernel side..
>> 
>>  src/PVE/QemuServer.pm | 21 +++++++++++++++------
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>> 
>> diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
>> index a7fbec14..82e9c004 100644
>> --- a/src/PVE/QemuServer.pm
>> +++ b/src/PVE/QemuServer.pm
>> @@ -2333,19 +2333,28 @@ my sub get_vmid_total_cgroup_memory_usage {
>>      if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
> 
> 
> Just to be sure: The stats from memory.current or memory.stat inside the
> /sys/fs/cgroup/qemu.slice/${vmid}.scope/ directory is definitively not
> enough for our usecases?

well, if we go for RSS they might be, for PSS they are not, since that
doesn't exist there?

> 
>>          while (my $pid = <$procs_fh>) {
>>              chomp($pid);
>> +            my $filename = 'smaps_rollup';
>> +            my $extract_usage_re = qr/^Pss:\s+([0-9]+) kB$/;
>>  
>> -            open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
>> +            my $ksm_pages = PVE::Tools::file_read_firstline("/proc/$pid/ksm_merging_pages");
>> +            # more than 1G shared via KSM, smaps_rollup will be slow, fall back to RSS
>> +            if ($ksm_pages && $ksm_pages > 1024 * 1024 / 4) {
> 
> Hmm, can lead to sudden "jumps" in the rrd metrics data, but that's rather
> independent from the decision expression but always the case if we switch
> between that. Dropping this stat again completely could be also an option..
> A middle ground could be to just display it for the live view with such a
> heuristic as proposed here.

having the live view and the metrics use different semantics seems kinda
confusing tbh..


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-11-25 14:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25 13:51 Fabian Grünbichler
2025-11-25 14:08 ` Thomas Lamprecht
2025-11-25 14:20   ` Fabian Grünbichler [this message]
2025-11-25 15:21     ` Thomas Lamprecht
2025-11-25 17:21       ` Aaron Lauterer
2025-11-25 18:17         ` Thomas Lamprecht
2025-11-25 14:53 ` Aaron Lauterer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1764080351.cnbokh9rxb.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal