From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH v2 pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 17 Apr 2026 14:33:18 +0200 [thread overview]
Message-ID: <a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com> (raw)
In-Reply-To: <DHTNKHTCOSVQ.1O7PCGVYVGS8F@proxmox.com>
Am 15.04.26 um 12:21 PM schrieb Kefu Chai:
> On Tue Apr 14, 2026 at 11:36 PM CST, Fiona Ebner wrote:
>> Am 14.04.26 um 1:08 PM schrieb Fiona Ebner:
>>> Note that I did play around with memory hotplug and ballooning before as
>>> well, not sure if related.
>>>
>>> Unfortunately, I don't have the debug symbols for librbd.so.1 right now:
>>>
>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>> #0 0x00007ea8da6442d0 in tc_memalign () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
>>>> [Current thread is 1 (Thread 0x7ea8ca66a6c0 (LWP 109157))]
>
> Hi Fiona,
>
> Thank you for the backtrace.
>
> I dug into the segfault, but was not able to reprudce it locally after
> performing over 3300 snapshot ops on two RBD drives, including
> concurrent and batch ops.
>
> I also searched over internet to see if we are alone. And here is what I
> found:
>
> The crash site (SLL_Next in linked_list.h) is a known pattern in
How do you know that the crash site is there? The trace only shows
tc_memalign(). Telling from the past issues you found, it could be, but
I wouldn't jump to conclusions.
> gperftools. It's what happens when a *prior* operation corrupts a freed
> block's embedded freelist pointer, and a later allocation follows the
> garbage and segfaults. Essentially, tc_memalign() is the victim, not the
> culprit. RHBZ #1430223 [1] and gperftools issues #1036 [2] and #1096 [3]
> all describe the same crash pattern with Ceph. RHBZ #1494309 [4] is also
> worth noting -- tcmalloc didn't intercept aligned_alloc() until
> gperftools 2.6.1-5, causing a mixed-allocator situation where glibc
> allocated but tcmalloc freed. That one's long fixed in our 2.16, but it
> shows this corner of the allocator has had real bugs before.
>
> If it happens again, probably the way to catch the actual corruption
> at its source would be:
>
> LD_PRELOAD=libtcmalloc_debug.so.4 qemu-system-x86_64 ...
This doesn't work unfortunately:
LD_PRELOAD=libtcmalloc_debug.so.4 /usr/bin/kvm \
[I] root@pve9a1 ~# ~/start-vm.sh
Check failed: !internal_init_start_has_run: Heap-check constructor
called twice. Perhaps you both linked in the heap checker, and also
used LD_PRELOAD to load it?
Aborted (core dumped)
I also tried
ln -s libtcmalloc_debug.so.4 libtcmalloc.so.4
but then my VMs wouldn't start even with 'qm start ID --timeout 900'.
Not sure if just too slow or another issue.
>
> This adds guard words around allocations and checks them on free,
> so it'd point straight at whatever is doing the corrupting write.
> This comes with 2-5x overhead, but guess it's fine for debugging.
>
> If you manage to reproduce it, I am more than happy to debug it with
> your reproducer.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430223
> [2] https://github.com/gperftools/gperftools/issues/1036
> [3] https://github.com/gperftools/gperftools/issues/1096
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1494309
>
>>
>> I had added malloc_stats(); calls around
>> MallocExtension_ReleaseFreeMemory(); to better see the effects, which
>> also requires including malloc.h in pve-backup.c when building for
>> tcmalloc. I also did a few backups before, so I can't rule out that it's
>> related to that. I did a build of librbd1 and librados2 with the debug
>> symbols now, but haven't been able to reproduce the issue yet. Will try
>> more tomorrow.
I was sick for 2 days, so I only got around to do more testing today. I
was not able to trigger any segfaults since that initial one, so let's
hope this was a one-off issue and/or caused by my additional
modification with malloc_stats().
prev parent reply other threads:[~2026-04-17 12:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 5:46 Kefu Chai
2026-04-14 5:46 ` [PATCH v2 pve-qemu 1/2] add patch to support using " Kefu Chai
2026-04-14 9:53 ` Fiona Ebner
2026-04-14 5:46 ` [PATCH v2 pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-14 11:08 ` [PATCH v2 pve-qemu 0/2] Re-enable " Fiona Ebner
2026-04-14 15:36 ` Fiona Ebner
2026-04-15 10:22 ` Kefu Chai
2026-04-17 12:33 ` Fiona Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com \
--to=f.ebner@proxmox.com \
--cc=k.chai@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.