public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* Re: [pve-devel] applied: [RFC pve-qemu] disable jemalloc
       [not found] <1c4d80a05d8328a52b9d15e991fd4d348bce1327.camel@groupe-cyllene.com>
@ 2023-03-11 13:14 ` DERUMIER, Alexandre
  2023-03-13  7:17   ` DERUMIER, Alexandre
  0 siblings, 1 reply; 5+ messages in thread
From: DERUMIER, Alexandre @ 2023-03-11 13:14 UTC (permalink / raw)
  To: pve-devel

Le samedi 11 mars 2023 à 10:01 +0100, Thomas Lamprecht a écrit :
> Hi,
> 
> Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> > I'm currently benching again qemu with librbd and memory allocator.
> > 
> > 
> > It's seem that they are still performance problem with default
> > glibc
> > allocator, around 20-25% less iops and bigger latency.
> 
> Are those numbers compared to jemalloc or tcmalloc?
> 
oh sorry,

tcmalloc.  (I'm gotting almost same result with jmalloc, maybe a little
bit more less/unstable)


> Also, a key problem with allocator tuning is that its heavily
> dependent on
> the workload of each specific library (i.e., not only QEMU itself but
> also
> the specific block backend (library).
> 
> > 
yes, it should help librbd mainly. I don't think help other storage.



> > From my bench, i'm around 60k iops vs 80-90k iops with 4k randread.
> > 
> > Redhat have also notice it
> > 
> > 
> > I known than jemalloc was buggy with rust lib  && pbs block driver,
> > but did you have evaluated tcmalloc ?
> 
> Yes, for PBS once - was way worse in how it generally worked than
> either
> jemalloc and default glibc IIRC, but I don't think I checked for
> latency,
> as then we tracked down freed memory that the allocator did not give
> back
> to the system to how they internally try to keep a pool of available
> memory
> around.
> 
I known than jemalloc could have strange effect on memory. (ceph was
using jemalloc some year ago with this kind of side effect, and they
have migrate to tcmalloc later)


> So for latency it might be a win, but IMO not to sure if the other
> effects
> it has are worth that.
> 
> > 
yes, latency is my main objective, mainly for ceph synchronous write
with low iodepth,they are pretty slow, so 20% improvement is really
big.

> > Note that it's possible to load it dynamically with LD_PRELOAD,
> > so maybe could we add an option in vm config to enable it ? 
> > 

> I'm not 100% sure if QEMU copes well with preloading it via the
> dynlinker
> as is, or if we need to hard-disable malloc_trim support for it then.
> As currently with the "system" allocator (glibc) there's malloc_trim
> called
> (semi-) periodically via call_rcu_thread - and at least qemu's meson
> build
> system config disables malloc_trim for tcmalloc or jemalloc.
> 
> 
> Or did you already test this directly on QEMU, not just rbd bench? As
> then
> I'd be open to add some tuning config with a allocator sub-property
> in there
> to our CFGs.
> 

I have tried directly in qemu, with 

"
    my $run_qemu = sub {
        PVE::Tools::run_fork sub {

            $ENV{LD_PRELOAD} = "/usr/lib/x86_64-linux-
gnu/libtcmalloc.so.4" ;

            PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM
$vmid", %systemd_properties);

"

I really don't known about malloc_trim,
the initial discussion about is here,
https://patchwork.ozlabs.org/project/qemu-devel/patch/1510899814-19372-1-git-send-email-yang.zhong@intel.com/
and indeed, it's disabled when building with tcmalloc/jemalloc  , but I
don't known about dynamic loading.

But I don't have any crash or segfault.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] applied: [RFC pve-qemu] disable jemalloc
  2023-03-11 13:14 ` [pve-devel] applied: [RFC pve-qemu] disable jemalloc DERUMIER, Alexandre
@ 2023-03-13  7:17   ` DERUMIER, Alexandre
  0 siblings, 0 replies; 5+ messages in thread
From: DERUMIER, Alexandre @ 2023-03-13  7:17 UTC (permalink / raw)
  To: pve-devel; +Cc: t.lamprecht

I have done tests writing a small C program calling  malloc_trim(0),

and it don't break/segfault with LD_PRELOAD tcmalloc.

I don't think that tcmalloc override this specific gblic function, but
maybe malloc_trim is triming empty glibc malloc memory.


I have done 2 days of continous fio benchmark in a vm with tcmalloc
preload, I don't have any problem.

But the speed is really night & days, with iodepth=64  4k randread,

It's something like average 85-90k iops (with some spike at 120k)  vs
50kiops. (with spike to 60kiops).


If it's ok for you, I'll send a patch with something like:

vmid.conf
---------
memory_allocator: glibc|tcmalloc


and simply add the LD_PRELOAD in systemd unit when vm is starting

?


Le samedi 11 mars 2023 à 13:14 +0000, DERUMIER, Alexandre a écrit :
> Le samedi 11 mars 2023 à 10:01 +0100, Thomas Lamprecht a écrit :
> > Hi,
> > 
> > Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> > > I'm currently benching again qemu with librbd and memory
> > > allocator.
> > > 
> > > 
> > > It's seem that they are still performance problem with default
> > > glibc
> > > allocator, around 20-25% less iops and bigger latency.
> > 
> > Are those numbers compared to jemalloc or tcmalloc?
> > 
> oh sorry,
> 
> tcmalloc.  (I'm gotting almost same result with jmalloc, maybe a
> little
> bit more less/unstable)
> 
> 
> > Also, a key problem with allocator tuning is that its heavily
> > dependent on
> > the workload of each specific library (i.e., not only QEMU itself
> > but
> > also
> > the specific block backend (library).
> > 
> > > 
> yes, it should help librbd mainly. I don't think help other storage.
> 
> 
> 
> > > From my bench, i'm around 60k iops vs 80-90k iops with 4k
> > > randread.
> > > 
> > > Redhat have also notice it
> > > 
> > > 
> > > I known than jemalloc was buggy with rust lib  && pbs block
> > > driver,
> > > but did you have evaluated tcmalloc ?
> > 
> > Yes, for PBS once - was way worse in how it generally worked than
> > either
> > jemalloc and default glibc IIRC, but I don't think I checked for
> > latency,
> > as then we tracked down freed memory that the allocator did not
> > give
> > back
> > to the system to how they internally try to keep a pool of
> > available
> > memory
> > around.
> > 
> I known than jemalloc could have strange effect on memory. (ceph was
> using jemalloc some year ago with this kind of side effect, and they
> have migrate to tcmalloc later)
> 
> 
> > So for latency it might be a win, but IMO not to sure if the other
> > effects
> > it has are worth that.
> > 
> > > 
> yes, latency is my main objective, mainly for ceph synchronous write
> with low iodepth,they are pretty slow, so 20% improvement is really
> big.
> 
> > > Note that it's possible to load it dynamically with LD_PRELOAD,
> > > so maybe could we add an option in vm config to enable it ? 
> > > 
> 
> > I'm not 100% sure if QEMU copes well with preloading it via the
> > dynlinker
> > as is, or if we need to hard-disable malloc_trim support for it
> > then.
> > As currently with the "system" allocator (glibc) there's
> > malloc_trim
> > called
> > (semi-) periodically via call_rcu_thread - and at least qemu's
> > meson
> > build
> > system config disables malloc_trim for tcmalloc or jemalloc.
> > 
> > 
> > Or did you already test this directly on QEMU, not just rbd bench?
> > As
> > then
> > I'd be open to add some tuning config with a allocator sub-property
> > in there
> > to our CFGs.
> > 
> 
> I have tried directly in qemu, with 
> 
> "
>     my $run_qemu = sub {
>         PVE::Tools::run_fork sub {
> 
>             $ENV{LD_PRELOAD} = "/usr/lib/x86_64-linux-
> gnu/libtcmalloc.so.4" ;
> 
>             PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM
> $vmid", %systemd_properties);
> 
> "
> 
> I really don't known about malloc_trim,
> the initial discussion about is here,
> https://patchwork.ozlabs.org/project/qemu-devel/patch/1510899814-19372-1-git-send-email-yang.zhong@intel.com/
> and indeed, it's disabled when building with tcmalloc/jemalloc  , but
> I
> don't known about dynamic loading.
> 
> But I don't have any crash or segfault.
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] applied: [RFC pve-qemu] disable jemalloc
  2023-03-10 18:05   ` DERUMIER, Alexandre
@ 2023-03-11  9:01     ` Thomas Lamprecht
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2023-03-11  9:01 UTC (permalink / raw)
  To: DERUMIER, Alexandre, pve-devel

Hi,

Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> I'm currently benching again qemu with librbd and memory allocator.
> 
> 
> It's seem that they are still performance problem with default glibc
> allocator, around 20-25% less iops and bigger latency.

Are those numbers compared to jemalloc or tcmalloc?

Also, a key problem with allocator tuning is that its heavily dependent on
the workload of each specific library (i.e., not only QEMU itself but also
the specific block backend (library).

> 
> From my bench, i'm around 60k iops vs 80-90k iops with 4k randread.
> 
> Redhat have also notice it
> https://bugzilla.redhat.com/show_bug.cgi?id=1717414
> https://sourceware.org/bugzilla/show_bug.cgi?id=28050
> 
> 
> I known than jemalloc was buggy with rust lib  && pbs block driver,
> but did you have evaluated tcmalloc ?

Yes, for PBS once - was way worse in how it generally worked than either
jemalloc and default glibc IIRC, but I don't think I checked for latency,
as then we tracked down freed memory that the allocator did not give back
to the system to how they internally try to keep a pool of available memory
around.

So for latency it might be a win, but IMO not to sure if the other effects
it has are worth that.

> 
> Note that it's possible to load it dynamically with LD_PRELOAD,
> so maybe could we add an option in vm config to enable it ? 
> 

I'm not 100% sure if QEMU copes well with preloading it via the dynlinker
as is, or if we need to hard-disable malloc_trim support for it then.
As currently with the "system" allocator (glibc) there's malloc_trim called
(semi-) periodically via call_rcu_thread - and at least qemu's meson build
system config disables malloc_trim for tcmalloc or jemalloc.


Or did you already test this directly on QEMU, not just rbd bench? As then
I'd be open to add some tuning config with a allocator sub-property in there
to our CFGs.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] applied:  [RFC pve-qemu] disable jemalloc
  2020-12-15 13:43 ` [pve-devel] applied: " Thomas Lamprecht
@ 2023-03-10 18:05   ` DERUMIER, Alexandre
  2023-03-11  9:01     ` Thomas Lamprecht
  0 siblings, 1 reply; 5+ messages in thread
From: DERUMIER, Alexandre @ 2023-03-10 18:05 UTC (permalink / raw)
  To: pve-devel; +Cc: t.lamprecht

Hi,
sorry for bumping this old thread.

I'm currently benching again qemu with librbd and memory allocator.


It's seem that they are still performance problem with default glibc
allocator, around 20-25% less iops and bigger latency.

From my bench, i'm around 60k iops vs 80-90k iops with 4k randread.

Redhat have also notice it
https://bugzilla.redhat.com/show_bug.cgi?id=1717414
https://sourceware.org/bugzilla/show_bug.cgi?id=28050


I known than jemalloc was buggy with rust lib  && pbs block driver,
but did you have evaluated tcmalloc ?


Note that it's possible to load it dynamically with LD_PRELOAD,
so maybe could we add an option in vm config to enable it ? 



Le mardi 15 décembre 2020 à 14:43 +0100, Thomas Lamprecht a écrit :
> On 10.12.20 16:23, Stefan Reiter wrote:
> > jemalloc does not play nice with our Rust library (proxmox-backup-
> > qemu),
> > specifically it never releases memory allocated from Rust to the
> > OS.
> > This leads to a problem with larger caches (e.g. for the PBS block
> > driver).
> > 
> > It appears to be related to this GitHub issue:
> > https://github.com/jemalloc/jemalloc/issues/1398
> > 
> > The background_thread solution seems weirdly hacky, so let's
> > disable
> > jemalloc entirely for now.
> > 
> > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> > ---
> > 
> > @Alexandre: you were the one to introduce jemalloc into our QEMU
> > builds a long
> > time ago - does it still provide a measurable benefit? If the
> > performance loss
> > would be too great in removing it, we could maybe figure out some
> > workarounds as
> > well.
> > 
> > Its current behaviour does seem rather broken to me though...
> > 
> >  debian/rules | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > 
> 
> applied, thanks!
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] applied:  [RFC pve-qemu] disable jemalloc
  2020-12-10 15:23 [pve-devel] " Stefan Reiter
@ 2020-12-15 13:43 ` Thomas Lamprecht
  2023-03-10 18:05   ` DERUMIER, Alexandre
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2020-12-15 13:43 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 10.12.20 16:23, Stefan Reiter wrote:
> jemalloc does not play nice with our Rust library (proxmox-backup-qemu),
> specifically it never releases memory allocated from Rust to the OS.
> This leads to a problem with larger caches (e.g. for the PBS block driver).
> 
> It appears to be related to this GitHub issue:
> https://github.com/jemalloc/jemalloc/issues/1398
> 
> The background_thread solution seems weirdly hacky, so let's disable
> jemalloc entirely for now.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> @Alexandre: you were the one to introduce jemalloc into our QEMU builds a long
> time ago - does it still provide a measurable benefit? If the performance loss
> would be too great in removing it, we could maybe figure out some workarounds as
> well.
> 
> Its current behaviour does seem rather broken to me though...
> 
>  debian/rules | 1 -
>  1 file changed, 1 deletion(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-13  7:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1c4d80a05d8328a52b9d15e991fd4d348bce1327.camel@groupe-cyllene.com>
2023-03-11 13:14 ` [pve-devel] applied: [RFC pve-qemu] disable jemalloc DERUMIER, Alexandre
2023-03-13  7:17   ` DERUMIER, Alexandre
2020-12-10 15:23 [pve-devel] " Stefan Reiter
2020-12-15 13:43 ` [pve-devel] applied: " Thomas Lamprecht
2023-03-10 18:05   ` DERUMIER, Alexandre
2023-03-11  9:01     ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal