public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 10 Apr 2026 10:12:45 +0200	[thread overview]
Message-ID: <b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com> (raw)
In-Reply-To: <20260410043027.3621673-1-k.chai@proxmox.com>

Am 10.04.26 um 6:30 AM schrieb Kefu Chai:
> Following up on the RFC thread [0], here's the formal submission to
> re-enable tcmalloc for pve-qemu.
> 
> Quick recap: librbd's I/O path allocates a lot of small, short-lived
> objects with plain new/malloc (ObjectReadRequest, bufferlist, etc.),
> and glibc's ptmalloc2 handles this pattern poorly -- cross-thread
> arena contention and cache-line bouncing show up clearly in perf
> profiles. tcmalloc's per-thread fast path avoids both.
>
> A bit of history for context: tcmalloc was tried in 2015 but dropped
> after 8 days due to gperftools 2.2 tuning issues (fixed in 2.4+).
> jemalloc replaced it but was dropped in 2020 because it didn't
> release Rust-allocated memory (from proxmox-backup-qemu) back to the
> OS. PVE 9 ships gperftools 2.16, and patch 1/2 addresses the
> reclamation gap explicitly.
> 
> On Dietmar's two concerns from the RFC:
> 
> "Could ReleaseFreeMemory() halt the application?" -- No, and I
> verified this directly. It walks tcmalloc's page heap free span
> lists and calls madvise(MADV_DONTNEED) on each span. It does not
> walk allocated memory or compact the heap. A standalone test
> reclaimed 386 MB of 410 MB cached memory (94%) in effectively zero
> wall time. The call runs once at backup completion, same spot where
> malloc_trim runs today.
> 
> "Wouldn't a pool allocator in librbd be the proper fix?" -- In
> principle yes, but I audited librbd in Ceph squid and it does NOT
> use a pool allocator -- all I/O path objects go through plain new.
> Ceph's mempool is tracking-only, not actual pooling. Adding real
> pooling would be a significant Ceph-side change (submission and
> completion happen on different threads), and it's orthogonal to the
> allocator choice here.
> 
> Also thanks to Alexandre for confirming the 2015 gperftools issues
> are long resolved.
> 
> Test results
> ------------
> 
> Benchmarked on a local vstart Ceph cluster (3 OSDs on local NVMe).
> This is the worst case for showing allocator impact, since there's
> no network latency for CPU savings to amortize against:
> 
>   rbd bench --io-type read --io-size 4096 --io-threads 16 \
>             --io-pattern rand
> 
>   Metric         | glibc ptmalloc2 | tcmalloc  | Delta
>   ---------------+-----------------+-----------+--------
>   IOPS           |         131,201 |   136,389 |  +4.0%
>   CPU time       |        1,556 ms |  1,439 ms |  -7.5%
>   Cycles         |           6.74B |     6.06B | -10.1%
>   Cache misses   |          137.1M |    123.9M |  -9.6%
> 
> perf report on the glibc run shows ~8% of CPU in allocator internals
> (_int_malloc, cfree, malloc_consolidate, _int_free_*); the same
> symbols are barely visible with tcmalloc because the fast path is
> just a pointer bump. The Ceph blog [1] reports ~50% IOPS gain on
> production clusters where network RTT dominates per-I/O latency --
> the 10% CPU savings compound there since the host can push more I/O
> into the pipeline during the same wall time.

How does the performance change when doing IO within a QEMU guest?

How does this affect the performance for other storage types, like ZFS,
qcow2 on top of directory-based storages, qcow2 on top of LVM, LVM-thin,
etc. and other workloads like saving VM state during snapshot, transfer
during migration, maybe memory hotplug/ballooning, network performance
for vNICs?




  parent reply	other threads:[~2026-04-10  8:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10  4:30 Kefu Chai
2026-04-10  4:30 ` [PATCH pve-qemu 1/2] PVE: use " Kefu Chai
2026-04-10  4:30 ` [PATCH pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-10  8:12 ` Fiona Ebner [this message]
2026-04-10 10:45   ` [PATCH pve-qemu 0/2] Re-enable " DERUMIER, Alexandre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1db5ad4-b3aa-4980-afd6-56cce2120fad@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal