From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Alexandre DERUMIER <aderumier@odiso.com>
Cc: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
Date: Tue, 15 Sep 2020 16:32:52 +0200 [thread overview]
Message-ID: <6b680921-12d0-006b-6d04-bbe1c4bb04f8@proxmox.com> (raw)
In-Reply-To: <1798333820.838842.1600178990068.JavaMail.zimbra@odiso.com>
On 9/15/20 4:09 PM, Alexandre DERUMIER wrote:
>>> Can you try to give pmxcfs real time scheduling, e.g., by doing:
>>>
>>> # systemctl edit pve-cluster
>>>
>>> And then add snippet:
>>>
>>>
>>> [Service]
>>> CPUSchedulingPolicy=rr
>>> CPUSchedulingPriority=99
> yes, sure, I'll do it now
>
>
>> I'm currently digging the logs
>>> Is your most simplest/stable reproducer still a periodic restart of corosync in one node?
> yes, a simple "systemctl restart corosync" on 1 node each minute
>
>
>
> After 1hour, it's still locked.
>
> on other nodes, I still have pmxfs logs like:
>
I mean this is bad, but also great!
Cam you do a coredump of the whole thing and upload it somewhere with the version info
used (for dbgsym package)? That could help a lot.
> manual "pmxcfs -d"
> https://gist.github.com/aderumier/4cd91d17e1f8847b93ea5f621f257c2e
>
Hmm, the fuse connection of the previous one got into a weird state (or something is still
running) but I'd rather say this is a side-effect not directly connected to the real bug.
>
> some interesting dmesg about "pvesr"
>
> [Tue Sep 15 14:45:34 2020] INFO: task pvesr:19038 blocked for more than 120 seconds.
> [Tue Sep 15 14:45:34 2020] Tainted: P O 5.4.60-1-pve #1
> [Tue Sep 15 14:45:34 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Tue Sep 15 14:45:34 2020] pvesr D 0 19038 1 0x00000080
> [Tue Sep 15 14:45:34 2020] Call Trace:
> [Tue Sep 15 14:45:34 2020] __schedule+0x2e6/0x6f0
> [Tue Sep 15 14:45:34 2020] ? filename_parentat.isra.57.part.58+0xf7/0x180
> [Tue Sep 15 14:45:34 2020] schedule+0x33/0xa0
> [Tue Sep 15 14:45:34 2020] rwsem_down_write_slowpath+0x2ed/0x4a0
> [Tue Sep 15 14:45:34 2020] down_write+0x3d/0x40
> [Tue Sep 15 14:45:34 2020] filename_create+0x8e/0x180
> [Tue Sep 15 14:45:34 2020] do_mkdirat+0x59/0x110
> [Tue Sep 15 14:45:34 2020] __x64_sys_mkdir+0x1b/0x20
> [Tue Sep 15 14:45:34 2020] do_syscall_64+0x57/0x190
> [Tue Sep 15 14:45:34 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
hmm, hangs in mkdir (cluster wide locking)
next prev parent reply other threads:[~2020-09-15 14:33 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 14:11 Alexandre DERUMIER
2020-09-04 12:29 ` Alexandre DERUMIER
2020-09-04 15:42 ` Dietmar Maurer
2020-09-05 13:32 ` Alexandre DERUMIER
2020-09-05 15:23 ` dietmar
2020-09-05 17:30 ` Alexandre DERUMIER
2020-09-06 4:21 ` dietmar
2020-09-06 5:36 ` Alexandre DERUMIER
2020-09-06 6:33 ` Alexandre DERUMIER
2020-09-06 8:43 ` Alexandre DERUMIER
2020-09-06 12:14 ` dietmar
2020-09-06 12:19 ` dietmar
2020-09-07 7:00 ` Thomas Lamprecht
2020-09-07 7:19 ` Alexandre DERUMIER
2020-09-07 8:18 ` dietmar
2020-09-07 9:32 ` Alexandre DERUMIER
2020-09-07 13:23 ` Alexandre DERUMIER
2020-09-08 4:41 ` dietmar
2020-09-08 7:11 ` Alexandre DERUMIER
2020-09-09 20:05 ` Thomas Lamprecht
2020-09-10 4:58 ` Alexandre DERUMIER
2020-09-10 8:21 ` Thomas Lamprecht
2020-09-10 11:34 ` Alexandre DERUMIER
2020-09-10 18:21 ` Thomas Lamprecht
2020-09-14 4:54 ` Alexandre DERUMIER
2020-09-14 7:14 ` Dietmar Maurer
2020-09-14 8:27 ` Alexandre DERUMIER
2020-09-14 8:51 ` Thomas Lamprecht
2020-09-14 15:45 ` Alexandre DERUMIER
2020-09-15 5:45 ` dietmar
2020-09-15 6:27 ` Alexandre DERUMIER
2020-09-15 7:13 ` dietmar
2020-09-15 8:42 ` Alexandre DERUMIER
2020-09-15 9:35 ` Alexandre DERUMIER
2020-09-15 9:46 ` Thomas Lamprecht
2020-09-15 10:15 ` Alexandre DERUMIER
2020-09-15 11:04 ` Alexandre DERUMIER
2020-09-15 12:49 ` Alexandre DERUMIER
2020-09-15 13:00 ` Thomas Lamprecht
2020-09-15 14:09 ` Alexandre DERUMIER
2020-09-15 14:19 ` Alexandre DERUMIER
2020-09-15 14:32 ` Thomas Lamprecht [this message]
2020-09-15 14:57 ` Alexandre DERUMIER
2020-09-15 15:58 ` Alexandre DERUMIER
2020-09-16 7:34 ` Alexandre DERUMIER
2020-09-16 7:58 ` Alexandre DERUMIER
2020-09-16 8:30 ` Alexandre DERUMIER
2020-09-16 8:53 ` Alexandre DERUMIER
[not found] ` <1894376736.864562.1600253445817.JavaMail.zimbra@odiso.com>
2020-09-16 13:15 ` Alexandre DERUMIER
2020-09-16 14:45 ` Thomas Lamprecht
2020-09-16 15:17 ` Alexandre DERUMIER
2020-09-17 9:21 ` Fabian Grünbichler
2020-09-17 9:59 ` Alexandre DERUMIER
2020-09-17 10:02 ` Alexandre DERUMIER
2020-09-17 11:35 ` Thomas Lamprecht
2020-09-20 23:54 ` Alexandre DERUMIER
2020-09-22 5:43 ` Alexandre DERUMIER
2020-09-24 14:02 ` Fabian Grünbichler
2020-09-24 14:29 ` Alexandre DERUMIER
2020-09-24 18:07 ` Alexandre DERUMIER
2020-09-25 6:44 ` Alexandre DERUMIER
2020-09-25 7:15 ` Alexandre DERUMIER
2020-09-25 9:19 ` Fabian Grünbichler
2020-09-25 9:46 ` Alexandre DERUMIER
2020-09-25 12:51 ` Fabian Grünbichler
2020-09-25 16:29 ` Alexandre DERUMIER
2020-09-28 9:17 ` Fabian Grünbichler
2020-09-28 9:35 ` Alexandre DERUMIER
2020-09-28 15:59 ` Alexandre DERUMIER
2020-09-29 5:30 ` Alexandre DERUMIER
2020-09-29 8:51 ` Fabian Grünbichler
2020-09-29 9:37 ` Alexandre DERUMIER
2020-09-29 10:52 ` Alexandre DERUMIER
2020-09-29 11:43 ` Alexandre DERUMIER
2020-09-29 11:50 ` Alexandre DERUMIER
2020-09-29 13:28 ` Fabian Grünbichler
2020-09-29 13:52 ` Alexandre DERUMIER
2020-09-30 6:09 ` Alexandre DERUMIER
2020-09-30 6:26 ` Thomas Lamprecht
2020-09-15 7:58 ` Thomas Lamprecht
2020-12-29 14:21 ` Josef Johansson
2020-09-04 15:46 ` Alexandre DERUMIER
2020-09-30 15:50 ` Thomas Lamprecht
2020-10-15 9:16 ` Eneko Lacunza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6b680921-12d0-006b-6d04-bbe1c4bb04f8@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=aderumier@odiso.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal