public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stephane Chazelas <stephane@chazelas.org>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH] ZFS ARC size not taken into account by pvestatd or ksmtuned
Date: Tue, 19 Jan 2021 14:42:35 +0000	[thread overview]
Message-ID: <20210119144235.54f7jofljgjqpbts@chazelas.org> (raw)

[note that I'm not subscribed to the list]

Hello,

I've been meaning to send this years ago. Sorry for the delay.

We've been maintaining the patch below on our servers for years
now (since 2015), even before ZFS was officially supported by
PVE.

We had experienced VM balloons swelling and processes in VMs
running out of memory even though the host had tons of RAM.

We had tracked that down to pvestatd reclaiming memory from the
VMs. pvestatd targets 80% memory utilisation (in terms of memory
that is not free and not in buffers or caches: (memtotal -
memfree - buffers - cached) / memtotal).

The problem is that the ZFS ARC is tracked independendly (not as
part of "buffers" or "cached" above).

The size of that ARC cache also adapts with memory pressure. But
here, since the autoballooning frees memory as soon as it's used
up by the ARC, the ARC size grows and grows while VMs access
their disk, and we've got plenty of wasted free memory that is
never used.

So in the end, with an ARC allowed to grow up to half the RAM,
we end up in a situation where pvestatd in effect targets 30%
max memory utilisation (with 20% free or in buffers and 50% in
ARC).

Something similar happens for KSM (memory page deduplication).
/usr/sbin/ksmtuned monitors memory utilisation (again
total-cached-buffers-free) against kvm process memory
allocation, and tells the ksm daemon to scan more and more
pages, more and more aggressively as long as the "used" memory
is above 80%.

That probably explains why performances decrease significantly
after a while and why doing a "echo 3 >
/proc/sys/vm/drop_caches" (which clears buffers, caches *AND*
the ZFS arc cache) gives a second life to the system.

(by the way, a recent version of ProcFSTools.pm added a
read_pressure function, but it doesn't look like it's used
anywhere).

--- /usr/share/perl5/PVE/ProcFSTools.pm.distrib	2020-12-03 15:53:17.000000000 +0000
+++ /usr/share/perl5/PVE/ProcFSTools.pm	2021-01-19 13:44:42.480272044 +0000
@@ -268,6 +268,19 @@ sub read_meminfo {
 
     $res->{memtotal} = $d->{memtotal};
     $res->{memfree} =  $d->{memfree} + $d->{buffers} + $d->{cached};
+
+    # Add the ZFS ARC if any
+    if (my $fh_arc = IO::File->new("/proc/spl/kstat/zfs/arcstats", "r")) {
+	while (my $line = <$fh_arc>) {
+	    if ($line =~ m/^size .* (\d+)/) {
+	        # "size" already in bytes
+		$res->{memfree} += $1;
+		last;
+	    }
+	}
+	close($fh_arc);
+    }
+
     $res->{memused} = $res->{memtotal} - $res->{memfree};
 
     $res->{swaptotal} = $d->{swaptotal};
--- /usr/sbin/ksmtuned.distrib	2020-07-24 10:04:45.827828719 +0100
+++ /usr/sbin/ksmtuned	2021-01-19 14:37:43.416360037 +0000
@@ -75,10 +75,17 @@ committed_memory () {
     ps -C "$progname" -o vsz= | awk '{ sum += $1 }; END { print sum }'
 }
 
-free_memory () {
-    awk '/^(MemFree|Buffers|Cached):/ {free += $2}; END {print free}' \
-                /proc/meminfo
-}
+free_memory () (
+    shopt -s nullglob
+    exec awk '
+      NR == FNR {
+          if (/^(MemFree|Buffers|Cached):/) free += $2
+          next
+      }
+      $1 == "size" {free += int($3/1024)}
+      END {print free}
+      ' /proc/meminfo /proc/spl/kstat/zfs/[a]rcstats
+)
 
 increase_npages() {
     local delta




             reply	other threads:[~2021-01-19 14:47 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:42 Stephane Chazelas [this message]
2021-01-23  9:14 ` Bruce Wainer
2021-01-24  9:22   ` Stephane CHAZELAS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210119144235.54f7jofljgjqpbts@chazelas.org \
    --to=stephane@chazelas.org \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal