public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Kefu Chai <k.chai@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH v2 pve-qemu 0/2] Re-enable tcmalloc as the memory allocator
Date: Fri, 17 Apr 2026 14:33:18 +0200	[thread overview]
Message-ID: <a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com> (raw)
In-Reply-To: <DHTNKHTCOSVQ.1O7PCGVYVGS8F@proxmox.com>

Am 15.04.26 um 12:21 PM schrieb Kefu Chai:
> On Tue Apr 14, 2026 at 11:36 PM CST, Fiona Ebner wrote:
>> Am 14.04.26 um 1:08 PM schrieb Fiona Ebner:
>>> Note that I did play around with memory hotplug and ballooning before as
>>> well, not sure if related.
>>>
>>> Unfortunately, I don't have the debug symbols for librbd.so.1 right now:
>>>
>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>> #0  0x00007ea8da6442d0 in tc_memalign () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
>>>> [Current thread is 1 (Thread 0x7ea8ca66a6c0 (LWP 109157))]
> 
> Hi Fiona,
> 
> Thank you for the backtrace.
> 
> I dug into the segfault, but was not able to reprudce it locally after
> performing over 3300 snapshot ops on two RBD drives, including
> concurrent and batch ops.
> 
> I also searched over internet to see if we are alone. And here is what I
> found: 
> 
> The crash site (SLL_Next in linked_list.h) is a known pattern in

How do you know that the crash site is there? The trace only shows
tc_memalign(). Telling from the past issues you found, it could be, but
I wouldn't jump to conclusions.

> gperftools. It's what happens when a *prior* operation corrupts a freed
> block's embedded freelist pointer, and a later allocation follows the
> garbage and segfaults. Essentially, tc_memalign() is the victim, not the
> culprit. RHBZ #1430223 [1] and gperftools issues #1036 [2] and #1096 [3]
> all describe the same crash pattern with Ceph. RHBZ #1494309 [4] is also
> worth noting -- tcmalloc didn't intercept aligned_alloc() until
> gperftools 2.6.1-5, causing a mixed-allocator situation where glibc
> allocated but tcmalloc freed. That one's long fixed in our 2.16, but it
> shows this corner of the allocator has had real bugs before.
> 
> If it happens again, probably the way to catch the actual corruption
> at its source would be:
> 
>   LD_PRELOAD=libtcmalloc_debug.so.4 qemu-system-x86_64 ...

This doesn't work unfortunately:

LD_PRELOAD=libtcmalloc_debug.so.4 /usr/bin/kvm \
[I] root@pve9a1 ~# ~/start-vm.sh
Check failed: !internal_init_start_has_run: Heap-check constructor
called twice.  Perhaps you both linked in the heap checker, and also
used LD_PRELOAD to load it?
Aborted (core dumped)

I also tried
ln -s libtcmalloc_debug.so.4 libtcmalloc.so.4
but then my VMs wouldn't start even with 'qm start ID --timeout 900'.
Not sure if just too slow or another issue.

> 
> This adds guard words around allocations and checks them on free,
> so it'd point straight at whatever is doing the corrupting write.
> This comes with 2-5x overhead, but guess it's fine for debugging.
> 
> If you manage to reproduce it, I am more than happy to debug it with
> your reproducer.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1430223
> [2] https://github.com/gperftools/gperftools/issues/1036
> [3] https://github.com/gperftools/gperftools/issues/1096
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1494309
> 
>>
>> I had added malloc_stats(); calls around
>> MallocExtension_ReleaseFreeMemory(); to better see the effects, which
>> also requires including malloc.h in pve-backup.c when building for
>> tcmalloc. I also did a few backups before, so I can't rule out that it's
>> related to that. I did a build of librbd1 and librados2 with the debug
>> symbols now, but haven't been able to reproduce the issue yet. Will try
>> more tomorrow. 

I was sick for 2 days, so I only got around to do more testing today. I
was not able to trigger any segfaults since that initial one, so let's
hope this was a one-off issue and/or caused by my additional
modification with malloc_stats().




      reply	other threads:[~2026-04-17 12:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14  5:46 Kefu Chai
2026-04-14  5:46 ` [PATCH v2 pve-qemu 1/2] add patch to support using " Kefu Chai
2026-04-14  9:53   ` Fiona Ebner
2026-04-14  5:46 ` [PATCH v2 pve-qemu 2/2] d/rules: enable " Kefu Chai
2026-04-14 11:08 ` [PATCH v2 pve-qemu 0/2] Re-enable " Fiona Ebner
2026-04-14 15:36   ` Fiona Ebner
2026-04-15 10:22     ` Kefu Chai
2026-04-17 12:33       ` Fiona Ebner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4df3e21-a138-4000-9645-f17f50b88592@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=k.chai@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal