all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 10 Apr 2026 10:12:45 +0200	[thread overview]
Message-ID: <b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com> (raw)
In-Reply-To: <20260410043027.3621673-1-k.chai@proxmox.com>

Am 10.04.26 um 6:30 AM schrieb Kefu Chai:
> Following up on the RFC thread [0], here's the formal submission to
> re-enable tcmalloc for pve-qemu.
> 
> Quick recap: librbd's I/O path allocates a lot of small, short-lived
> objects with plain new/malloc (ObjectReadRequest, bufferlist, etc.),
> and glibc's ptmalloc2 handles this pattern poorly -- cross-thread
> arena contention and cache-line bouncing show up clearly in perf
> profiles. tcmalloc's per-thread fast path avoids both.
>
> A bit of history for context: tcmalloc was tried in 2015 but dropped
> after 8 days due to gperftools 2.2 tuning issues (fixed in 2.4+).
> jemalloc replaced it but was dropped in 2020 because it didn't
> release Rust-allocated memory (from proxmox-backup-qemu) back to the
> OS. PVE 9 ships gperftools 2.16, and patch 1/2 addresses the
> reclamation gap explicitly.
> 
> On Dietmar's two concerns from the RFC:
> 
> "Could ReleaseFreeMemory() halt the application?" -- No, and I
> verified this directly. It walks tcmalloc's page heap free span
> lists and calls madvise(MADV_DONTNEED) on each span. It does not
> walk allocated memory or compact the heap. A standalone test
> reclaimed 386 MB of 410 MB cached memory (94%) in effectively zero
> wall time. The call runs once at backup completion, same spot where
> malloc_trim runs today.
> 
> "Wouldn't a pool allocator in librbd be the proper fix?" -- In
> principle yes, but I audited librbd in Ceph squid and it does NOT
> use a pool allocator -- all I/O path objects go through plain new.
> Ceph's mempool is tracking-only, not actual pooling. Adding real
> pooling would be a significant Ceph-side change (submission and
> completion happen on different threads), and it's orthogonal to the
> allocator choice here.
> 
> Also thanks to Alexandre for confirming the 2015 gperftools issues
> are long resolved.
> 
> Test results
> ------------
> 
> Benchmarked on a local vstart Ceph cluster (3 OSDs on local NVMe).
> This is the worst case for showing allocator impact, since there's
> no network latency for CPU savings to amortize against:
> 
>   rbd bench --io-type read --io-size 4096 --io-threads 16 \
>             --io-pattern rand
> 
>   Metric         | glibc ptmalloc2 | tcmalloc  | Delta
>   ---------------+-----------------+-----------+--------
>   IOPS           |         131,201 |   136,389 |  +4.0%
>   CPU time       |        1,556 ms |  1,439 ms |  -7.5%
>   Cycles         |           6.74B |     6.06B | -10.1%
>   Cache misses   |          137.1M |    123.9M |  -9.6%
> 
> perf report on the glibc run shows ~8% of CPU in allocator internals
> (_int_malloc, cfree, malloc_consolidate, _int_free_*); the same
> symbols are barely visible with tcmalloc because the fast path is
> just a pointer bump. The Ceph blog [1] reports ~50% IOPS gain on
> production clusters where network RTT dominates per-I/O latency --
> the 10% CPU savings compound there since the host can push more I/O
> into the pipeline during the same wall time.

How does the performance change when doing IO within a QEMU guest?

How does this affect the performance for other storage types, like ZFS,
qcow2 on top of directory-based storages, qcow2 on top of LVM, LVM-thin,
etc. and other workloads like saving VM state during snapshot, transfer
during migration, maybe memory hotplug/ballooning, network performance
for vNICs?




  parent reply	other threads:[~2026-04-10  8:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10  4:30 Kefu Chai
2026-04-10  4:30 ` [PATCH pve-qemu 1/2] PVE: use " Kefu Chai
2026-04-10  4:30 ` [PATCH pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-10  8:12 ` Fiona Ebner [this message]
2026-04-10 10:45   ` [PATCH pve-qemu 0/2] Re-enable " DERUMIER, Alexandre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal