public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] qemu + tcmalloc for rbd
@ 2024-01-09 17:02 DERUMIER, Alexandre
  2024-01-10  9:12 ` Fiona Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: DERUMIER, Alexandre @ 2024-01-09 17:02 UTC (permalink / raw)
  To: pve-devel

Hi,

I still have this last year patch pending

https://lists.proxmox.com/pipermail/pve-devel/2023-May/056815.html

to enabled conditionnaly tcmalloc in qemu 

It's still required for performance with librbd with last qemu /lirbd
I get a 30-40% performance boost in iops and latency for small
read/writes.


I would like to have a solution in proxmox repo, instead of maintain it
on my side.


Currently, In production, I compile qemu with tcmalloc at build.

This patch serie, allow to do use LD_PRELOAD +  disable malloc_trim()
call in qemu.

I'm not expert in C (I re-used code from haproxy, which is doing
exactly the same thing with tcmalloc && trim). 
So if somebody can review it, it could be great :)



Another way (maybe safer), is to build 2 binary in same package 
(/usr/bin/kvm-tcmalloc  && /usr/bin/kvm), and give option to user to
choose it.









^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu + tcmalloc for rbd
  2024-01-09 17:02 [pve-devel] qemu + tcmalloc for rbd DERUMIER, Alexandre
@ 2024-01-10  9:12 ` Fiona Ebner
  2024-01-10  9:38   ` DERUMIER, Alexandre
  2024-01-10 10:15   ` Thomas Lamprecht
  0 siblings, 2 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-01-10  9:12 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre

Am 09.01.24 um 18:02 schrieb DERUMIER, Alexandre:
> Hi,
> 
> I still have this last year patch pending
> 
> https://lists.proxmox.com/pipermail/pve-devel/2023-May/056815.html
> 
> to enabled conditionnaly tcmalloc in qemu 
> 
> It's still required for performance with librbd with last qemu /lirbd
> I get a 30-40% performance boost in iops and latency for small
> read/writes.
> 
> 
> I would like to have a solution in proxmox repo, instead of maintain it
> on my side.
> 
> 
> Currently, In production, I compile qemu with tcmalloc at build.
> 
> This patch serie, allow to do use LD_PRELOAD +  disable malloc_trim()
> call in qemu.
> 
> I'm not expert in C (I re-used code from haproxy, which is doing
> exactly the same thing with tcmalloc && trim). 
> So if somebody can review it, it could be great :)
> 

Unfortunately, the QEMU patch seems rather hacky and I'd prefer to not
include and maintain it. If tcmalloc would just provide malloc_trim()
(e.g. as a no-op), there would be no need for the ugly
at-runtime-detection at all.

> 
> Another way (maybe safer), is to build 2 binary in same package 
> (/usr/bin/kvm-tcmalloc  && /usr/bin/kvm), and give option to user to
> choose it.
> 

If we go for this route, I'd rather have two conflicting packages to
avoid the redundant space usage for most people. Or if we really want to
support being able to use both on a single system, an add-on kind of
package with just the second binary.

For reference, since it's been a while, previous discussion:
https://lists.proxmox.com/pipermail/pve-devel/2023-March/056129.html

Maybe you'd also like to ping the upstream enhancement request for glibc?
https://sourceware.org/bugzilla/show_bug.cgi?id=28050




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu + tcmalloc for rbd
  2024-01-10  9:12 ` Fiona Ebner
@ 2024-01-10  9:38   ` DERUMIER, Alexandre
  2024-01-10 14:53     ` Thomas Lamprecht
  2024-01-10 10:15   ` Thomas Lamprecht
  1 sibling, 1 reply; 6+ messages in thread
From: DERUMIER, Alexandre @ 2024-01-10  9:38 UTC (permalink / raw)
  To: pve-devel, f.ebner

Hi Fiona, (and happy new year)




> Currently, In production, I compile qemu with tcmalloc at build.
> 
> This patch serie, allow to do use LD_PRELOAD +  disable malloc_trim()
> call in qemu.
> 
> I'm not expert in C (I re-used code from haproxy, which is doing
> exactly the same thing with tcmalloc && trim). 
> So if somebody can review it, it could be great :)
> 

>>Unfortunately, the QEMU patch seems rather hacky and I'd prefer to
>>not
>>include and maintain it. If tcmalloc would just provide malloc_trim()
>>(e.g. as a no-op), there would be no need for the ugly
>>at-runtime-detection at all.

ok no problem.

Note that I don't remember to to have seen problem with a simple
LD_preload, even if qemu is calling malloc_trim().
But It's was only benchmark, and not production.

ceph guys also have benchmark it with LD_preload here:
https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/


> 
> Another way (maybe safer), is to build 2 binary in same package 
> (/usr/bin/kvm-tcmalloc  && /usr/bin/kvm), and give option to user to
> choose it.
> 

>>If we go for this route, I'd rather have two conflicting packages to
>>avoid the redundant space usage for most people. Or if we really want
>>to
>>support being able to use both on a single system, an add-on kind of
>>package with just the second binary.

Personnaly, I'm fine with 2 packages.

pve-qemu  && pve-qemu-tcmalloc for example.

with in pakckage control 

Provides: pve-qemu
Conflicts: pve-qemu
Replaces: pve-qemu

?




>>For reference, since it's been a while, previous discussion:
>>https://lists.proxmox.com/pipermail/pve-devel/2023-March/056129.html
>>
>>Maybe you'd also like to ping the upstream enhancement request for
>>glibc?
>>https://sourceware.org/bugzilla/show_bug.cgi?id=28050

yes, good idea. (I don't have munch hope about this ^_^ )

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu + tcmalloc for rbd
  2024-01-10  9:12 ` Fiona Ebner
  2024-01-10  9:38   ` DERUMIER, Alexandre
@ 2024-01-10 10:15   ` Thomas Lamprecht
  2024-01-10 18:29     ` DERUMIER, Alexandre
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Lamprecht @ 2024-01-10 10:15 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fiona Ebner, DERUMIER, Alexandre

Am 10/01/2024 um 10:12 schrieb Fiona Ebner:
> Am 09.01.24 um 18:02 schrieb DERUMIER, Alexandre:
>> Another way (maybe safer), is to build 2 binary in same package 
>> (/usr/bin/kvm-tcmalloc  && /usr/bin/kvm), and give option to user to
>> choose it.
>>
> 
> If we go for this route, I'd rather have two conflicting packages to
> avoid the redundant space usage for most people. Or if we really want to
> support being able to use both on a single system, an add-on kind of
> package with just the second binary.

I would like to avoid both, an extra package or an extra binary, as both
are a significant overhead for both us to maintain and user to work with.

Either tcmalloc works as allocator and is an actual improvement overall,
in which case we should unconditionally use it, or it doesn't work, or is
no improvement, in which case it makes no sense to expose it.

IME changing allocator is nice for some specific benchmarks, but overall
there often even are some regressions in other configurations..
We e.g. tested different allocators extensively when debugging some
memory growth during PBS backups, none of the shiny alternatives was
really any better, thus we resulted to triggering explicit trims.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu + tcmalloc for rbd
  2024-01-10  9:38   ` DERUMIER, Alexandre
@ 2024-01-10 14:53     ` Thomas Lamprecht
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2024-01-10 14:53 UTC (permalink / raw)
  To: Proxmox VE development discussion, DERUMIER, Alexandre, f.ebner

Am 10/01/2024 um 10:38 schrieb DERUMIER, Alexandre:
>>> Unfortunately, the QEMU patch seems rather hacky and I'd prefer to
>>> not
>>> include and maintain it. If tcmalloc would just provide malloc_trim()
>>> (e.g. as a no-op), there would be no need for the ugly
>>> at-runtime-detection at all.
> ok no problem.
> 
> Note that I don't remember to to have seen problem with a simple
> LD_preload, even if qemu is calling malloc_trim().
> But It's was only benchmark, and not production.

Yeah, because LD_PRELOAD just overrides symbols, but does not
unload libraries or the like.

So when you LD_PRELOAD the tcmalloc.so it will load the malloc
related functions, but as it doesn't ship any malloc_trim()
function (symbol), the one from glibc (which is still present)
will be called, which is useless but probably not that much
extra work as the glibc malloc heap is rather empty anyway.

In the case one builds and links directly against tcmalloc
glibc's malloc is completely out of the picture, so that's why
QEMU added a conflict if both, any of the alternative malloc
implementations and the malloc-trim configure option is enabled.

If we want to expose this dynamically the qemu patch probably
wouldn't be required for above reason anyway (or we could just
ship our tcmalloc, or a separate object, that provides a no-op
malloc_trim() function symbol).




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pve-devel] qemu + tcmalloc for rbd
  2024-01-10 10:15   ` Thomas Lamprecht
@ 2024-01-10 18:29     ` DERUMIER, Alexandre
  0 siblings, 0 replies; 6+ messages in thread
From: DERUMIER, Alexandre @ 2024-01-10 18:29 UTC (permalink / raw)
  To: pve-devel, t.lamprecht, f.ebner


Am 10/01/2024 um 10:12 schrieb Fiona Ebner:
> Am 09.01.24 um 18:02 schrieb DERUMIER, Alexandre:
> > Another way (maybe safer), is to build 2 binary in same package 
> > (/usr/bin/kvm-tcmalloc  && /usr/bin/kvm), and give option to user
> > to
> > choose it.
> > 
> 
> If we go for this route, I'd rather have two conflicting packages to
> avoid the redundant space usage for most people. Or if we really want
> to
> support being able to use both on a single system, an add-on kind of
> package with just the second binary.

>>I would like to avoid both, an extra package or an extra binary, as
>>both are a significant overhead for both us to maintain and user to
>>work with.
>>
>>Either tcmalloc works as allocator and is an actual improvement
>>overall,
>>in which case we should unconditionally use it, or it doesn't work,
>>or is
>>no improvement, in which case it makes no sense to expose it.

Yes, I totally understand that. If user have problems (maybe not
related to allocator), it'll need to all tests x2 to debug
(iothreads,..).

>>IME changing allocator is nice for some specific benchmarks, but
>>overall
>>there often even are some regressions in other configurations..

Personnaly, I really see a big difference on real workload with
latencies. (where latencies is really where ceph can be slow)

>>We e.g. tested different allocators extensively when debugging some
>>memory growth during PBS backups, none of the shiny alternatives was
>>really any better, thus we resulted to triggering explicit trims.

yes, I remember of that.  I don't use PBS, so I can't comment about
this.

Do you have more information about how to reproduce this ? 
I could take some test tcmalloc again with pbs backups.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-10 18:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-09 17:02 [pve-devel] qemu + tcmalloc for rbd DERUMIER, Alexandre
2024-01-10  9:12 ` Fiona Ebner
2024-01-10  9:38   ` DERUMIER, Alexandre
2024-01-10 14:53     ` Thomas Lamprecht
2024-01-10 10:15   ` Thomas Lamprecht
2024-01-10 18:29     ` DERUMIER, Alexandre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal