From: Gilberto Ferreira <gilberto.nunes32@gmail.com>
To: Stefan Radman <stefan.radman@me.com>
Cc: PVE User List <pve-user@pve.proxmox.com>
Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
Date: Tue, 2 Apr 2024 08:15:00 -0300 [thread overview]
Message-ID: <CAOKSTButEg+8Xi-3tkyJQY9JgkG3ywq8x=9iX7Pn9vbrSUxj_w@mail.gmail.com> (raw)
In-Reply-To: <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
Good
Em ter., 2 de abr. de 2024, 08:09, Stefan Radman <stefan.radman@me.com>
escreveu:
> Workaround: No more kernel panics on reboot when pinning kernel
> 6.2.16-20-pve.
>
> Affected kernels:
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> The original issue [1] was solved long ago [2] but apparently
> re-introduced recently [3].
>
> Regression [4] being discussed on kernel.org
>
> Looks like a back and forth in the tg3 driver.
>
> Note that the kernel panic is only triggered by “reboot” and not by
> “shutdown”.
>
> Stefan
>
> root@per740:~# proxmox-boot-tool kernel list
> Manually selected kernels:
> None.
>
> Automatically selected kernels:
> 6.2.16-20-pve
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> Pinned kernel:
> 6.2.16-20-pve
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.2.16-20-pve)
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] tg3: Disable tg3 device on system reboot to avoid triggering AER
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> [3] tg3: power down device only on SYSTEM_POWER_OFF
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9fc3bc7643341dc5be7d269f3d3dbe441d8d7ac3
>
> [4] * [PATCH] tg3: add new module param to force device power down on
> reboot
>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>
>
> On Apr 2, 2024, at 09:37, Gilberto Ferreira <gilberto.nunes32@gmail.com>
> wrote:
>
> Perhaps you should try another kernel besides 6.15 like 6.2 for instance.
>
> Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com> escreveu:
>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman <stefan.radman@me.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: PVE User List <pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
>> R740.
>>
>> Again, the kernel panic was triggered by a BCM5720 running Broadcom
>> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>>
>> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>>
>> Stefan
>>
>> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
>> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> [1325589.991223] {1}[Hardware Error]: event severity: fatal
>> [1325589.991225] {1}[Hardware Error]: Error 0, type: fatal
>> [1325589.991227] {1}[Hardware Error]: section_type: PCIe error
>> [1325589.991228] {1}[Hardware Error]: port_type: 0, PCIe end point
>> [1325589.991231] {1}[Hardware Error]: version: 3.0
>> [1325589.991233] {1}[Hardware Error]: command: 0x0002, status: 0x0010
>> [1325589.991235] {1}[Hardware Error]: device_id: 0000:01:00.1
>> [1325589.991237] {1}[Hardware Error]: slot: 0
>> [1325589.991239] {1}[Hardware Error]: secondary_bus: 0x00
>> [1325589.991240] {1}[Hardware Error]: vendor_id: 0x14e4, device_id:
>> 0x165f
>> [1325589.991242] {1}[Hardware Error]: class_code: 020000
>> [1325589.991244] {1}[Hardware Error]: aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> [1325589.991246] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
>> [1325589.991248] {1}[Hardware Error]: TLP Header: 40000001 0000010f
>> 90028090 00000000
>> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
>> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O
>> 6.5.13-1-pve #1
>> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
>> 2.20.1 09/13/2023
>> [1325589.991259] Call Trace:
>> [1325589.991261] <NMI>
>>
>> root@per740:~# pveversion
>> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>>
>> root@per740:~# ethtool -i eno4
>> driver: tg3
>> version: 6.5.13-3-pve
>> firmware-version: FFV22.71.3 bc 5720-v1.39
>> expansion-rom-version:
>> bus-info: 0000:01:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>>
>>
>> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user <
>> pve-user@lists.proxmox.com> wrote:
>> >
>> >
>> > From: Stefan Radman <stefan.radman@me.com>
>> > Subject: 6.5.13-3-pve kernel panic on shutdown
>> > Date: March 28, 2024 at 15:50:02 GMT+1
>> > To: PVE User List <pve-user@pve.proxmox.com>
>> >
>> >
>> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
>> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>> >
>> > The kernel panic is triggered 3-4 seconds after the last network
>> interface goes down (onboard BCM5720 LOM), while the system enters S5
>> (sleep) state.
>> >
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> >
>> > This is reproducible on every reboot.
>> >
>> > R540 and BCM5720 are running the latest firmware available from the
>> Dell support website.
>> >
>> > Link [2] below seem to suggest that my problem is related to a
>> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>> >
>> > Has anyone else seen this lately (or ever) with Promox VE?
>> >
>> > Thank you
>> >
>> > Stefan
>> >
>> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>> >
>> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot"
>> which causes Bus Fatal Error when rebooting system with BCM5720 NIC
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>> >
>> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>> >
>> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
>> triggering AER
>> >
>> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
>> >
>> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>> >
>> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
>> >
>> > [6] * [PATCH] tg3: add new module param to force device power down on
>> reboot
>> >
>> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>> >
>> >
>> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
>> devices.
>> > [84458.607141] systemd-shutdown[1]: Rebooting.
>> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
>> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion
>> is called outbound_intr_mask:0x40000009
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> > [84463.685116] {1}[Hardware Error]: event severity: fatal
>> > [84463.685117] {1}[Hardware Error]: Error 0, type: fatal
>> > [84463.685119] {1}[Hardware Error]: section_type: PCIe error
>> > [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point
>> > [84463.685121] {1}[Hardware Error]: version: 3.0
>> > [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010
>> > [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1
>> > [84463.685125] {1}[Hardware Error]: slot: 0
>> > [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00
>> > [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id:
>> 0x165f
>> > [84463.685128] {1}[Hardware Error]: class_code: 020000
>> > [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> > [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
>> > [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f
>> 90028090 00000000
>> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
>> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>> O 6.5.13-3-pve #1
>> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
>> 2.21.1 03/07/2024
>> > [84463.685140] Call Trace:
>> > [84463.685142] <NMI>
>> > …
>> >
>> > root@pve:~# pveversion
>> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
>> > root@pve:~# ethtool -i eno2
>> > driver: tg3
>> > version: 6.5.13-3-pve
>> > firmware-version: FFV22.71.3 bc 5720-v1.39
>> > expansion-rom-version:
>> > bus-info: 0000:04:00.1
>> > supports-statistics: yes
>> > supports-test: yes
>> > supports-eeprom-access: yes
>> > supports-register-dump: yes
>> > supports-priv-flags: no
>> > root@pve:~# lspci | fgrep 04:00.1
>> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>> BCM5720 Gigabit Ethernet PCIe
>> >
>> >
>> >
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-user@lists.proxmox.com
>> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List <
>> pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>
prev parent reply other threads:[~2024-04-02 11:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <D8D305A2-D2B7-4A5D-821C-65DE75621457@kmi.com>
[not found] ` <93280CB8-7582-4456-9101-D594CE2C86A2@kmi.com>
[not found] ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com>
2024-03-28 15:18 ` Gilberto Ferreira
[not found] ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
2024-03-28 15:57 ` Gilberto Ferreira
[not found] ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
2024-04-02 7:37 ` Gilberto Ferreira
[not found] ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
2024-04-02 11:15 ` Gilberto Ferreira [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOKSTButEg+8Xi-3tkyJQY9JgkG3ywq8x=9iX7Pn9vbrSUxj_w@mail.gmail.com' \
--to=gilberto.nunes32@gmail.com \
--cc=pve-user@pve.proxmox.com \
--cc=stefan.radman@me.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.