From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH v2 pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 17 Apr 2026 14:33:18 +0200 [thread overview]
Message-ID: <a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com> (raw)
In-Reply-To: <DHTNKHTCOSVQ.1O7PCGVYVGS8F@proxmox.com>
Am 15.04.26 um 12:21 PM schrieb Kefu Chai:
> On Tue Apr 14, 2026 at 11:36 PM CST, Fiona Ebner wrote:
>> Am 14.04.26 um 1:08 PM schrieb Fiona Ebner:
>>> Note that I did play around with memory hotplug and ballooning before as
>>> well, not sure if related.
>>>
>>> Unfortunately, I don't have the debug symbols for librbd.so.1 right now:
>>>
>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>> #0 0x00007ea8da6442d0 in tc_memalign () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
>>>> [Current thread is 1 (Thread 0x7ea8ca66a6c0 (LWP 109157))]
>
> Hi Fiona,
>
> Thank you for the backtrace.
>
> I dug into the segfault, but was not able to reprudce it locally after
> performing over 3300 snapshot ops on two RBD drives, including
> concurrent and batch ops.
>
> I also searched over internet to see if we are alone. And here is what I
> found:
>
> The crash site (SLL_Next in linked_list.h) is a known pattern in
How do you know that the crash site is there? The trace only shows
tc_memalign(). Telling from the past issues you found, it could be, but
I wouldn't jump to conclusions.
> gperftools. It's what happens when a *prior* operation corrupts a freed
> block's embedded freelist pointer, and a later allocation follows the
> garbage and segfaults. Essentially, tc_memalign() is the victim, not the
> culprit. RHBZ #1430223 [1] and gperftools issues #1036 [2] and #1096 [3]
> all describe the same crash pattern with Ceph. RHBZ #1494309 [4] is also
> worth noting -- tcmalloc didn't intercept aligned_alloc() until
> gperftools 2.6.1-5, causing a mixed-allocator situation where glibc
> allocated but tcmalloc freed. That one's long fixed in our 2.16, but it
> shows this corner of the allocator has had real bugs before.
>
> If it happens again, probably the way to catch the actual corruption
> at its source would be:
>
> LD_PRELOAD=libtcmalloc_debug.so.4 qemu-system-x86_64 ...
This doesn't work unfortunately:
LD_PRELOAD=libtcmalloc_debug.so.4 /usr/bin/kvm \
[I] root@pve9a1 ~# ~/start-vm.sh
Check failed: !internal_init_start_has_run: Heap-check constructor
called twice. Perhaps you both linked in the heap checker, and also
used LD_PRELOAD to load it?
Aborted (core dumped)
I also tried
ln -s libtcmalloc_debug.so.4 libtcmalloc.so.4
but then my VMs wouldn't start even with 'qm start ID --timeout 900'.
Not sure if just too slow or another issue.
>
> This adds guard words around allocations and checks them on free,
> so it'd point straight at whatever is doing the corrupting write.
> This comes with 2-5x overhead, but guess it's fine for debugging.
>
> If you manage to reproduce it, I am more than happy to debug it with
> your reproducer.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430223
> [2] https://github.com/gperftools/gperftools/issues/1036
> [3] https://github.com/gperftools/gperftools/issues/1096
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1494309
>
>>
>> I had added malloc_stats(); calls around
>> MallocExtension_ReleaseFreeMemory(); to better see the effects, which
>> also requires including malloc.h in pve-backup.c when building for
>> tcmalloc. I also did a few backups before, so I can't rule out that it's
>> related to that. I did a build of librbd1 and librados2 with the debug
>> symbols now, but haven't been able to reproduce the issue yet. Will try
>> more tomorrow.
I was sick for 2 days, so I only got around to do more testing today. I
was not able to trigger any segfaults since that initial one, so let's
hope this was a one-off issue and/or caused by my additional
modification with malloc_stats().
prev parent reply other threads:[~2026-04-17 12:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 5:46 Kefu Chai
2026-04-14 5:46 ` [PATCH v2 pve-qemu 1/2] add patch to support using " Kefu Chai
2026-04-14 9:53 ` Fiona Ebner
2026-04-14 5:46 ` [PATCH v2 pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-14 11:08 ` [PATCH v2 pve-qemu 0/2] Re-enable " Fiona Ebner
2026-04-14 15:36 ` Fiona Ebner
2026-04-15 10:22 ` Kefu Chai
2026-04-17 12:33 ` Fiona Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox