From: Kefu Chai <k.chai@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 10 Apr 2026 12:30:25 +0800 [thread overview]
Message-ID: <20260410043027.3621673-1-k.chai@proxmox.com> (raw)
Following up on the RFC thread [0], here's the formal submission to
re-enable tcmalloc for pve-qemu.
Quick recap: librbd's I/O path allocates a lot of small, short-lived
objects with plain new/malloc (ObjectReadRequest, bufferlist, etc.),
and glibc's ptmalloc2 handles this pattern poorly -- cross-thread
arena contention and cache-line bouncing show up clearly in perf
profiles. tcmalloc's per-thread fast path avoids both.
A bit of history for context: tcmalloc was tried in 2015 but dropped
after 8 days due to gperftools 2.2 tuning issues (fixed in 2.4+).
jemalloc replaced it but was dropped in 2020 because it didn't
release Rust-allocated memory (from proxmox-backup-qemu) back to the
OS. PVE 9 ships gperftools 2.16, and patch 1/2 addresses the
reclamation gap explicitly.
On Dietmar's two concerns from the RFC:
"Could ReleaseFreeMemory() halt the application?" -- No, and I
verified this directly. It walks tcmalloc's page heap free span
lists and calls madvise(MADV_DONTNEED) on each span. It does not
walk allocated memory or compact the heap. A standalone test
reclaimed 386 MB of 410 MB cached memory (94%) in effectively zero
wall time. The call runs once at backup completion, same spot where
malloc_trim runs today.
"Wouldn't a pool allocator in librbd be the proper fix?" -- In
principle yes, but I audited librbd in Ceph squid and it does NOT
use a pool allocator -- all I/O path objects go through plain new.
Ceph's mempool is tracking-only, not actual pooling. Adding real
pooling would be a significant Ceph-side change (submission and
completion happen on different threads), and it's orthogonal to the
allocator choice here.
Also thanks to Alexandre for confirming the 2015 gperftools issues
are long resolved.
Test results
------------
Benchmarked on a local vstart Ceph cluster (3 OSDs on local NVMe).
This is the worst case for showing allocator impact, since there's
no network latency for CPU savings to amortize against:
rbd bench --io-type read --io-size 4096 --io-threads 16 \
--io-pattern rand
Metric | glibc ptmalloc2 | tcmalloc | Delta
---------------+-----------------+-----------+--------
IOPS | 131,201 | 136,389 | +4.0%
CPU time | 1,556 ms | 1,439 ms | -7.5%
Cycles | 6.74B | 6.06B | -10.1%
Cache misses | 137.1M | 123.9M | -9.6%
perf report on the glibc run shows ~8% of CPU in allocator internals
(_int_malloc, cfree, malloc_consolidate, _int_free_*); the same
symbols are barely visible with tcmalloc because the fast path is
just a pointer bump. The Ceph blog [1] reports ~50% IOPS gain on
production clusters where network RTT dominates per-I/O latency --
the 10% CPU savings compound there since the host can push more I/O
into the pipeline during the same wall time.
The series is small:
1/2 adds the QEMU source patch (0048) with the CONFIG_TCMALLOC
meson define and the ReleaseFreeMemory() call in
pve-backup.c's cleanup path.
2/2 adds libgoogle-perftools-dev to Build-Depends and
--enable-malloc=tcmalloc to configure.
Runtime dep libgoogle-perftools4t64 (>= 2.16) is picked up
automatically by dh_shlibdeps.
[0]: https://lore.proxmox.com/pve-devel/DHCDIFA0P8QP.2CTY4G4EEGKQ0@proxmox.com/
[1]: https://ceph.io/en/news/blog/2023/reef-freelist-bench/
Kefu Chai (2):
PVE: use tcmalloc as the memory allocator
d/rules: enable tcmalloc as the memory allocator
debian/control | 1 +
...use-tcmalloc-as-the-memory-allocator.patch | 77 +++++++++++++++++++
debian/patches/series | 1 +
debian/rules | 1 +
4 files changed, 80 insertions(+)
create mode 100644 debian/patches/pve/0048-PVE-use-tcmalloc-as-the-memory-allocator.patch
--
2.47.3
next reply other threads:[~2026-04-10 4:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 4:30 Kefu Chai [this message]
2026-04-10 4:30 ` [PATCH pve-qemu 1/2] PVE: use " Kefu Chai
2026-04-10 4:30 ` [PATCH pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-10 8:12 ` [PATCH pve-qemu 0/2] Re-enable " Fiona Ebner
2026-04-10 10:45 ` DERUMIER, Alexandre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260410043027.3621673-1-k.chai@proxmox.com \
--to=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.