From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 4C6B691B52
 for <pve-devel@lists.proxmox.com>; Sat, 11 Mar 2023 10:02:01 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 2FFC5285E9
 for <pve-devel@lists.proxmox.com>; Sat, 11 Mar 2023 10:01:31 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Sat, 11 Mar 2023 10:01:30 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id F1E2441CB8;
 Sat, 11 Mar 2023 10:01:29 +0100 (CET)
Message-ID: <ab594136-845f-bcf9-c1b6-8aaeea8ef486@proxmox.com>
Date: Sat, 11 Mar 2023 10:01:28 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:111.0) Gecko/20100101
 Thunderbird/111.0
Content-Language: en-GB, de-AT
To: "DERUMIER, Alexandre" <Alexandre.DERUMIER@groupe-cyllene.com>,
 "pve-devel@lists.proxmox.com" <pve-devel@lists.proxmox.com>
References: <20201210152338.19423-1-s.reiter@proxmox.com>
 <3dd51877-142c-55b2-e2ed-33f48ecb348f@proxmox.com>
 <5d774074596e5430faaa26146c70fdd13b513598.camel@groupe-cyllene.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <5d774074596e5430faaa26146c70fdd13b513598.camel@groupe-cyllene.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.050 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] applied: [RFC pve-qemu] disable jemalloc
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Sat, 11 Mar 2023 09:02:01 -0000

Hi,

Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> I'm currently benching again qemu with librbd and memory allocator.
> 
> 
> It's seem that they are still performance problem with default glibc
> allocator, around 20-25% less iops and bigger latency.

Are those numbers compared to jemalloc or tcmalloc?

Also, a key problem with allocator tuning is that its heavily dependent on
the workload of each specific library (i.e., not only QEMU itself but also
the specific block backend (library).

> 
> From my bench, i'm around 60k iops vs 80-90k iops with 4k randread.
> 
> Redhat have also notice it
> https://bugzilla.redhat.com/show_bug.cgi?id=1717414
> https://sourceware.org/bugzilla/show_bug.cgi?id=28050
> 
> 
> I known than jemalloc was buggy with rust lib  && pbs block driver,
> but did you have evaluated tcmalloc ?

Yes, for PBS once - was way worse in how it generally worked than either
jemalloc and default glibc IIRC, but I don't think I checked for latency,
as then we tracked down freed memory that the allocator did not give back
to the system to how they internally try to keep a pool of available memory
around.

So for latency it might be a win, but IMO not to sure if the other effects
it has are worth that.

> 
> Note that it's possible to load it dynamically with LD_PRELOAD,
> so maybe could we add an option in vm config to enable it ? 
> 

I'm not 100% sure if QEMU copes well with preloading it via the dynlinker
as is, or if we need to hard-disable malloc_trim support for it then.
As currently with the "system" allocator (glibc) there's malloc_trim called
(semi-) periodically via call_rcu_thread - and at least qemu's meson build
system config disables malloc_trim for tcmalloc or jemalloc.


Or did you already test this directly on QEMU, not just rbd bench? As then
I'd be open to add some tuning config with a allocator sub-property in there
to our CFGs.