From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 10 Apr 2026 10:12:45 +0200 [thread overview]
Message-ID: <b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com> (raw)
In-Reply-To: <20260410043027.3621673-1-k.chai@proxmox.com>
Am 10.04.26 um 6:30 AM schrieb Kefu Chai:
> Following up on the RFC thread [0], here's the formal submission to
> re-enable tcmalloc for pve-qemu.
>
> Quick recap: librbd's I/O path allocates a lot of small, short-lived
> objects with plain new/malloc (ObjectReadRequest, bufferlist, etc.),
> and glibc's ptmalloc2 handles this pattern poorly -- cross-thread
> arena contention and cache-line bouncing show up clearly in perf
> profiles. tcmalloc's per-thread fast path avoids both.
>
> A bit of history for context: tcmalloc was tried in 2015 but dropped
> after 8 days due to gperftools 2.2 tuning issues (fixed in 2.4+).
> jemalloc replaced it but was dropped in 2020 because it didn't
> release Rust-allocated memory (from proxmox-backup-qemu) back to the
> OS. PVE 9 ships gperftools 2.16, and patch 1/2 addresses the
> reclamation gap explicitly.
>
> On Dietmar's two concerns from the RFC:
>
> "Could ReleaseFreeMemory() halt the application?" -- No, and I
> verified this directly. It walks tcmalloc's page heap free span
> lists and calls madvise(MADV_DONTNEED) on each span. It does not
> walk allocated memory or compact the heap. A standalone test
> reclaimed 386 MB of 410 MB cached memory (94%) in effectively zero
> wall time. The call runs once at backup completion, same spot where
> malloc_trim runs today.
>
> "Wouldn't a pool allocator in librbd be the proper fix?" -- In
> principle yes, but I audited librbd in Ceph squid and it does NOT
> use a pool allocator -- all I/O path objects go through plain new.
> Ceph's mempool is tracking-only, not actual pooling. Adding real
> pooling would be a significant Ceph-side change (submission and
> completion happen on different threads), and it's orthogonal to the
> allocator choice here.
>
> Also thanks to Alexandre for confirming the 2015 gperftools issues
> are long resolved.
>
> Test results
> ------------
>
> Benchmarked on a local vstart Ceph cluster (3 OSDs on local NVMe).
> This is the worst case for showing allocator impact, since there's
> no network latency for CPU savings to amortize against:
>
> rbd bench --io-type read --io-size 4096 --io-threads 16 \
> --io-pattern rand
>
> Metric | glibc ptmalloc2 | tcmalloc | Delta
> ---------------+-----------------+-----------+--------
> IOPS | 131,201 | 136,389 | +4.0%
> CPU time | 1,556 ms | 1,439 ms | -7.5%
> Cycles | 6.74B | 6.06B | -10.1%
> Cache misses | 137.1M | 123.9M | -9.6%
>
> perf report on the glibc run shows ~8% of CPU in allocator internals
> (_int_malloc, cfree, malloc_consolidate, _int_free_*); the same
> symbols are barely visible with tcmalloc because the fast path is
> just a pointer bump. The Ceph blog [1] reports ~50% IOPS gain on
> production clusters where network RTT dominates per-I/O latency --
> the 10% CPU savings compound there since the host can push more I/O
> into the pipeline during the same wall time.
How does the performance change when doing IO within a QEMU guest?
How does this affect the performance for other storage types, like ZFS,
qcow2 on top of directory-based storages, qcow2 on top of LVM, LVM-thin,
etc. and other workloads like saving VM state during snapshot, transfer
during migration, maybe memory hotplug/ballooning, network performance
for vNICs?
next prev parent reply other threads:[~2026-04-10 8:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 4:30 Kefu Chai
2026-04-10 4:30 ` [PATCH pve-qemu 1/2] PVE: use " Kefu Chai
2026-04-10 4:30 ` [PATCH pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-10 8:12 ` Fiona Ebner [this message]
2026-04-10 10:45 ` [PATCH pve-qemu 0/2] Re-enable " DERUMIER, Alexandre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox