all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Stoiko Ivanov <s.ivanov@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Markus Frank <m.frank@proxmox.com>, pmg-devel@lists.proxmox.com
Subject: Re: [pmg-devel] [PATCH pmg-api] config: adjust max_filters calculation to reflect current memory usage
Date: Mon, 15 Jan 2024 19:05:53 +0100	[thread overview]
Message-ID: <20240115190553.5edb7637@rosa.proxmox.com> (raw)
In-Reply-To: <2e202f39-179a-4f4e-a837-63852add0668@proxmox.com>

hi,

a bit late to the discussion on-list - had a few chats with Dominik
off-list and did some minimal testing:

On Fri, 12 Jan 2024 11:13:57 +0100
Thomas Lamprecht <t.lamprecht@proxmox.com> wrote:

> Am 10/01/2024 um 12:56 schrieb Markus Frank:
> > One pmg-smtp-filter process uses at least 220 MiB.
> > When having 100000 rules one process can take up to 330 MiB.  
> 
> That's probably talking about RSS here, or? That would be rather useless as
> it re-counts the memory used by shared libraries, which bloats the number,
> as they're actually only loaded once in memory.
> 
> What is the newer PSS (Proportional Set Size) metric in that case?  As that
> would be a better metric due to actually accounting for the proportional use
> of shared libraries.
> 
> For example, use the following bash one-liner to get both, PSS and RSS of
> each pmg-smtp-filter processes:
> 
>  for pid in $(pidof pmg-smtp-filter); do printf "PID %s: " $pid; awk '/Pss:/{ pss += $2 } /Rss:/{ rss += $2 } END { print "PSS =", pss, " RSS =", rss }' "/proc/$pid/smaps"; done

TIL: PSS/USS - thanks! (also for the one-liner :)

> 
> Here, on a pretty much idle setup with only a handful of rules, I get:
> 
> PID 405810: PSS = 84700  RSS = 225000
> PID 405809: PSS = 84714  RSS = 225000


for comparison - 2 more loaded productive instances (although also with a
quite small ruleset) I have access to:
PID 2908376: PSS = 114567  RSS = 227528
PID 2908355: PSS = 115023  RSS = 227776
and
PID 788678: PSS = 65242  RSS = 217564
PID 788600: PSS = 69126  RSS = 220656
PID 788507: PSS = 102765  RSS = 227596

> 
> As PSS is what matters here, the 84.7 MB are is quite in line with the 120 MB
> $servermem, at least for my (underused) setup.
I tried to get a few more filter processes to spawn (while `watch -n2` the
one-liner from above), by:
* stopping postfix on the sending system
* queueing 250 (also 1250) tiny mails to one recipient there
* starting postfix (so it starts to drain the queue at once)

to get more than 2-3 filter processes I had to either add a sleep to the
processing (e.g. at analyze_virus_clam), or send a larger mail (with a pdf
attachment that gets handed through pdf2text).

With that (and a small ruleset, without huge numbers of objects) the
PSS-size was between:
31967 and 62725 (for 20 parallel processes, although sometimes some pids did not yield
one or the other metric - probably while being started/torn-down)
numbers were similar for both types of mail (small text/plain only, and
with a ~156k pdf attachment).

> 
> In the get_max_filters calculation I'd rather look at the accounting for the
> system baseline memory usage through subtracting 512 MB, as that is rather
> way to low nowadays, a clamav alone takes up 1.2 - 1.5 GB.
Would also be my first go-to for adaptation (based on guess-work and
watching top/htop output of a few PMG instances in our support-channels)

I think  I can't remember any other (than the one that lead to this
patch) case of OOM-killed pmg-smtp-filter processes (for any system with >
2Gb memory) 
Neither do I recall recommending someone to manually tweak the
max_filter setting - so while adaptation might make sense - I think even
the current code works well in practice.

We do not need to cover all possible scenarios with it - as if someone
adds tons of signatures to clamAV (as described in the thread from the
commit-message), or avast, or many SpamAssassin plugins - they always have
the option to limit the filter-processes in the config).

> 
> Maybe turn that up first depending on physical_memory, i.e., for < 2 GB I'd
> keep it as is, otherwise deduct something like 1.5 GB.
> 
> Then check the actual memory growth per added filter process via checking
> the PSS sizes, if huge setups then use 200 MB we could increase that a bit,
> but it probably won't need to be the 300 MB of your patch. And we always can
> make $servermem dependent of available memory size too, e.g., assume bigger
> rule sets due to bigger resources available, like > 4 GB (or 8 GB) memory.
> 
> i.e., having a three-branch if here to cover the cases for
> - "low-memory but might work for small setups"
> - "ok'ish memory but needs some special tuning"
> - "more than the minimal recommended amount of memory"
> 
> if ($memory < 2000) {
>     warn "low amount of system memory installed, minimum requirement is 2 GB, recommended is 4+ GB\n".
>     $base = $memory > 1536 ? 1024 : 512;
>     $servermem = 120;
> } elsif ($memory < 4096) {
>     $base = 1500;
>     $servermem = 150;
> } else {
>     $base = 2500;
>     $servermem = 200;
> }
> 
> The $base and $servermem values are just guesstimated without to much
> thought and can surely be better chosen (PSS size of huge setups would be
> required for that).
> 
> A complete different alternative:
> rewrite the filter in rust and decimate the memory usage making it actual
> more performant and allowing really higher throughput.
For this I usually handwave around - pointing to SpamAssassin being the
largest memory-hog (and run-time and cpu) for the complete pmg-smtp-filter
- but I never actually checked if that's true - so I tried removing
SpamAssassin (very crudely by commenting the relevant imports and
replacing the actually checking with a no-op). The test with the small
mail described above yields:
PSS = 13829  RSS = 82420
so ~75-80% of memory of each pmg-smtp-filter seems used by SpamAssassin.

Not that there's not quite a bit of room for improvement in the code-base,
I'd probably first look at pmgpolicy for rewriting first (simply because of
SpamAssassin, and my doubt, that running SA in another way (spamd[0],
writing our own spamd in rust) will make it significantly faster/smaller).

summing up -  the warn for low memory Thomas suggested sounds like a good
idea, I'd probably only stick with 2 branches (<=2.5GB and >2.5GB) - but
fine either way

my tests with small ruleset, but many filter-processes put the
memory-requirement (PSS-based) at <70MiB/filter-process





      reply	other threads:[~2024-01-15 18:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 11:56 Markus Frank
2024-01-10 12:52 ` Dietmar Maurer
2024-01-10 13:04   ` Markus Frank
2024-01-10 13:38   ` Dominik Csapak
2024-01-10 13:40     ` Dominik Csapak
2024-01-10 13:57     ` Dietmar Maurer
2024-01-10 14:13       ` Dominik Csapak
2024-01-10 14:32         ` Dietmar Maurer
2024-01-12 10:13 ` Thomas Lamprecht
2024-01-15 18:05   ` Stoiko Ivanov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240115190553.5edb7637@rosa.proxmox.com \
    --to=s.ivanov@proxmox.com \
    --cc=m.frank@proxmox.com \
    --cc=pmg-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal