public inbox for pmg-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Markus Frank <m.frank@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] config: adjust max_filters calculation to reflect current memory usage
Date: Mon, 15 Jan 2024 19:05:53 +0100	[thread overview]
Message-ID: <20240115190553.5edb7637@rosa.proxmox.com> (raw)
In-Reply-To: <2e202f39-179a-4f4e-a837-63852add0668@proxmox.com>

hi,

a bit late to the discussion on-list - had a few chats with Dominik
off-list and did some minimal testing:

On Fri, 12 Jan 2024 11:13:57 +0100
Thomas Lamprecht <t.lamprecht@proxmox.com> wrote:

> Am 10/01/2024 um 12:56 schrieb Markus Frank:
> > One pmg-smtp-filter process uses at least 220 MiB.
> > When having 100000 rules one process can take up to 330 MiB.  
> 
> That's probably talking about RSS here, or? That would be rather useless as
> it re-counts the memory used by shared libraries, which bloats the number,
> as they're actually only loaded once in memory.
> 
> What is the newer PSS (Proportional Set Size) metric in that case?  As that
> would be a better metric due to actually accounting for the proportional use
> of shared libraries.
> 
> For example, use the following bash one-liner to get both, PSS and RSS of
> each pmg-smtp-filter processes:
> 
>  for pid in $(pidof pmg-smtp-filter); do printf "PID %s: " $pid; awk '/Pss:/{ pss += $2 } /Rss:/{ rss += $2 } END { print "PSS =", pss, " RSS =", rss }' "/proc/$pid/smaps"; done

TIL: PSS/USS - thanks! (also for the one-liner :)

> 
> Here, on a pretty much idle setup with only a handful of rules, I get:
> 
> PID 405810: PSS = 84700  RSS = 225000
> PID 405809: PSS = 84714  RSS = 225000


for comparison - 2 more loaded productive instances (although also with a
quite small ruleset) I have access to:
PID 2908376: PSS = 114567  RSS = 227528
PID 2908355: PSS = 115023  RSS = 227776
and
PID 788678: PSS = 65242  RSS = 217564
PID 788600: PSS = 69126  RSS = 220656
PID 788507: PSS = 102765  RSS = 227596

> 
> As PSS is what matters here, the 84.7 MB are is quite in line with the 120 MB
> $servermem, at least for my (underused) setup.
I tried to get a few more filter processes to spawn (while `watch -n2` the
one-liner from above), by:
* stopping postfix on the sending system
* queueing 250 (also 1250) tiny mails to one recipient there
* starting postfix (so it starts to drain the queue at once)

to get more than 2-3 filter processes I had to either add a sleep to the
processing (e.g. at analyze_virus_clam), or send a larger mail (with a pdf
attachment that gets handed through pdf2text).

With that (and a small ruleset, without huge numbers of objects) the
PSS-size was between:
31967 and 62725 (for 20 parallel processes, although sometimes some pids did not yield
one or the other metric - probably while being started/torn-down)
numbers were similar for both types of mail (small text/plain only, and
with a ~156k pdf attachment).

> 
> In the get_max_filters calculation I'd rather look at the accounting for the
> system baseline memory usage through subtracting 512 MB, as that is rather
> way to low nowadays, a clamav alone takes up 1.2 - 1.5 GB.
Would also be my first go-to for adaptation (based on guess-work and
watching top/htop output of a few PMG instances in our support-channels)

I think  I can't remember any other (than the one that lead to this
patch) case of OOM-killed pmg-smtp-filter processes (for any system with >
2Gb memory) 
Neither do I recall recommending someone to manually tweak the
max_filter setting - so while adaptation might make sense - I think even
the current code works well in practice.

We do not need to cover all possible scenarios with it - as if someone
adds tons of signatures to clamAV (as described in the thread from the
commit-message), or avast, or many SpamAssassin plugins - they always have
the option to limit the filter-processes in the config).

> 
> Maybe turn that up first depending on physical_memory, i.e., for < 2 GB I'd
> keep it as is, otherwise deduct something like 1.5 GB.
> 
> Then check the actual memory growth per added filter process via checking
> the PSS sizes, if huge setups then use 200 MB we could increase that a bit,
> but it probably won't need to be the 300 MB of your patch. And we always can
> make $servermem dependent of available memory size too, e.g., assume bigger
> rule sets due to bigger resources available, like > 4 GB (or 8 GB) memory.
> 
> i.e., having a three-branch if here to cover the cases for
> - "low-memory but might work for small setups"
> - "ok'ish memory but needs some special tuning"
> - "more than the minimal recommended amount of memory"
> 
> if ($memory < 2000) {
>     warn "low amount of system memory installed, minimum requirement is 2 GB, recommended is 4+ GB\n".
>     $base = $memory > 1536 ? 1024 : 512;
>     $servermem = 120;
> } elsif ($memory < 4096) {
>     $base = 1500;
>     $servermem = 150;
> } else {
>     $base = 2500;
>     $servermem = 200;
> }
> 
> The $base and $servermem values are just guesstimated without to much
> thought and can surely be better chosen (PSS size of huge setups would be
> required for that).
> 
> A complete different alternative:
> rewrite the filter in rust and decimate the memory usage making it actual
> more performant and allowing really higher throughput.
For this I usually handwave around - pointing to SpamAssassin being the
largest memory-hog (and run-time and cpu) for the complete pmg-smtp-filter
- but I never actually checked if that's true - so I tried removing
SpamAssassin (very crudely by commenting the relevant imports and
replacing the actually checking with a no-op). The test with the small
mail described above yields:
PSS = 13829  RSS = 82420
so ~75-80% of memory of each pmg-smtp-filter seems used by SpamAssassin.

Not that there's not quite a bit of room for improvement in the code-base,
I'd probably first look at pmgpolicy for rewriting first (simply because of
SpamAssassin, and my doubt, that running SA in another way (spamd[0],
writing our own spamd in rust) will make it significantly faster/smaller).

summing up -  the warn for low memory Thomas suggested sounds like a good
idea, I'd probably only stick with 2 branches (<=2.5GB and >2.5GB) - but
fine either way

my tests with small ruleset, but many filter-processes put the
memory-requirement (PSS-based) at <70MiB/filter-process





      reply	other threads:[~2024-01-15 18:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 11:56 Markus Frank
2024-01-10 12:52 ` Dietmar Maurer
2024-01-10 13:04   ` Markus Frank
2024-01-10 13:38   ` Dominik Csapak
2024-01-10 13:40     ` Dominik Csapak
2024-01-10 13:57     ` Dietmar Maurer
2024-01-10 14:13       ` Dominik Csapak
2024-01-10 14:32         ` Dietmar Maurer
2024-01-12 10:13 ` Thomas Lamprecht
2024-01-15 18:05   ` Stoiko Ivanov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240115190553.5edb7637@rosa.proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=m.frank@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal