public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]   ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com>
@ 2024-03-28 15:18     ` Gilberto Ferreira
       [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
       [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Gilberto Ferreira @ 2024-03-28 15:18 UTC (permalink / raw)
  To: Proxmox VE user list

Try to update the server firmware.
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: PVE User List <pve-user@pve.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 28 Mar 2024 15:50:02 +0100
> Subject: 6.5.13-3-pve kernel panic on shutdown
> I recently noticed that a Dell Poweredge R540 currently running Proxmox VE
> 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>
> The kernel panic is triggered 3-4 seconds after the last network interface
> goes down (onboard BCM5720 LOM), while the system enters S5 (sleep) state.
>
> [84459.970212] bond0: (slave eno1): link status definitely down, disabling
> slave
> [84459.982170] bond0: (slave eno2): link status definitely down, disabling
> slave
> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> [84460.001615] bond0: now running without any active interface!
> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
>
> This is reproducible on every reboot.
>
> R540 and BCM5720 are running the latest firmware available from the Dell
> support website.
>
> Link [2] below seem to suggest that my problem is related to a combination
> of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>
> Has anyone else seen this lately (or ever) with Promox VE?
>
> Thank you
>
> Stefan
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>
> [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>
> [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid triggering
> AER
>
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
>
> [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
>
> [6] * [PATCH] tg3: add new module param to force device power down on
> reboot
>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>
>
> [84458.600189] systemd-shutdown[1]: Syncing filesystems and block devices.
> [84458.607141] systemd-shutdown[1]: Rebooting.
> [84458.612283] spi-nor spi0.0: Software reset failed: -524
> [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> [84459.970212] bond0: (slave eno1): link status definitely down, disabling
> slave
> [84459.982170] bond0: (slave eno2): link status definitely down, disabling
> slave
> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> [84460.001615] bond0: now running without any active interface!
> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> [84463.685116] {1}[Hardware Error]: event severity: fatal
> [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [84463.685121] {1}[Hardware Error]:   version: 3.0
> [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> [84463.685125] {1}[Hardware Error]:   slot: 0
> [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
> [84463.685128] {1}[Hardware Error]:   class_code: 020000
> [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P           O
>      6.5.13-3-pve #1
> [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS 2.21.1
> 03/07/2024
> [84463.685140] Call Trace:
> [84463.685142]  <NMI>
> …
>
> root@pve:~# pveversion
> pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> root@pve:~# ethtool -i eno2
> driver: tg3
> version: 6.5.13-3-pve
> firmware-version: FFV22.71.3 bc 5720-v1.39
> expansion-rom-version:
> bus-info: 0000:04:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
> root@pve:~# lspci | fgrep 04:00.1
> 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> BCM5720 Gigabit Ethernet PCIe
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: PVE User List <pve-user@pve.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>
> Bcc:
> Date: Thu, 28 Mar 2024 15:50:02 +0100
> Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
@ 2024-03-28 15:57         ` Gilberto Ferreira
  0 siblings, 0 replies; 4+ messages in thread
From: Gilberto Ferreira @ 2024-03-28 15:57 UTC (permalink / raw)
  To: Proxmox VE user list

https://medium.com/@nothanjack/dealing-with-apei-generic-hardware-error-source-problems-in-linux-a8ee8a67c8c1
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em qui., 28 de mar. de 2024 às 12:54, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 28 Mar 2024 16:47:43 +0100
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> Hi Gilberto
>
> The server firmware is up to date.
>
> Stefan
>
> > On Mar 28, 2024, at 16:18, Gilberto Ferreira <gilberto.nunes32@gmail.com>
> wrote:
> >
> > Try to update the server firmware.
> > ---
> > Gilberto Nunes Ferreira
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> >
> >
> >
> >
> >
> > Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user <
> > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>>
> escreveu:
> >
> >>
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com
> >>
> >> To: PVE User List <pve-user@pve.proxmox.com <mailto:
> pve-user@pve.proxmox.com>>
> >> Cc:
> >> Bcc:
> >> Date: Thu, 28 Mar 2024 15:50:02 +0100
> >> Subject: 6.5.13-3-pve kernel panic on shutdown
> >> I recently noticed that a Dell Poweredge R540 currently running Proxmox
> VE
> >> 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
> >>
> >> The kernel panic is triggered 3-4 seconds after the last network
> interface
> >> goes down (onboard BCM5720 LOM), while the system enters S5 (sleep)
> state.
> >>
> >> [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling
> >> slave
> >> [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling
> >> slave
> >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> >> [84460.001615] bond0: now running without any active interface!
> >> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> >> Hardware Error Source: 5
> >>
> >> This is reproducible on every reboot.
> >>
> >> R540 and BCM5720 are running the latest firmware available from the Dell
> >> support website.
> >>
> >> Link [2] below seem to suggest that my problem is related to a
> combination
> >> of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
> >>
> >> Has anyone else seen this lately (or ever) with Promox VE?
> >>
> >> Thank you
> >>
> >> Stefan
> >>
> >> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
> >>
> >> [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> >> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
> >>
> >> [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
> >>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
> >>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
> >>
> >> [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
> triggering
> >> AER
> >>
> >>
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
> >>
> >> [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
> >>
> >>
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
> >>
> >> [6] * [PATCH] tg3: add new module param to force device power down on
> >> reboot
> >>
> >>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
> >>
> >>
> >> [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> >> [84458.607141] systemd-shutdown[1]: Rebooting.
> >> [84458.612283] spi-nor spi0.0: Software reset failed: -524
> >> [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> >> called outbound_intr_mask:0x40000009
> >> [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling
> >> slave
> >> [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling
> >> slave
> >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> >> [84460.001615] bond0: now running without any active interface!
> >> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> >> Hardware Error Source: 5
> >> [84463.685116] {1}[Hardware Error]: event severity: fatal
> >> [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> >> [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> >> [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> >> [84463.685121] {1}[Hardware Error]:   version: 3.0
> >> [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> >> [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> >> [84463.685125] {1}[Hardware Error]:   slot: 0
> >> [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> >> [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> >> [84463.685128] {1}[Hardware Error]:   class_code: 020000
> >> [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> >> aer_uncor_mask: 0x00010000
> >> [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> >> [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> >> 90028090 00000000
> >> [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> >> [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>  O
> >>     6.5.13-3-pve #1
> >> [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
> 2.21.1
> >> 03/07/2024
> >> [84463.685140] Call Trace:
> >> [84463.685142]  <NMI>
> >> …
> >>
> >> root@pve:~# pveversion
> >> pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> >> root@pve:~# ethtool -i eno2
> >> driver: tg3
> >> version: 6.5.13-3-pve
> >> firmware-version: FFV22.71.3 bc 5720-v1.39
> >> expansion-rom-version:
> >> bus-info: 0000:04:00.1
> >> supports-statistics: yes
> >> supports-test: yes
> >> supports-eeprom-access: yes
> >> supports-register-dump: yes
> >> supports-priv-flags: no
> >> root@pve:~# lspci | fgrep 04:00.1
> >> 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> >> BCM5720 Gigabit Ethernet PCIe
> >>
> >>
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com <mailto:
> pve-user@lists.proxmox.com>>
> >> To: PVE User List <pve-user@pve.proxmox.com <mailto:
> pve-user@pve.proxmox.com>>
> >> Cc: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com>>
> >> Bcc:
> >> Date: Thu, 28 Mar 2024 15:50:02 +0100
> >> Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>
> Bcc:
> Date: Thu, 28 Mar 2024 16:47:43 +0100
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
@ 2024-04-02  7:37       ` Gilberto Ferreira
       [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Gilberto Ferreira @ 2024-04-02  7:37 UTC (permalink / raw)
  To: Proxmox VE user list; +Cc: Stefan Radman, PVE User List

Perhaps you should try another kernel besides 6.15 like 6.2 for instance.

Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: PVE User List <pve-user@pve.proxmox.com>
> Bcc:
> Date: Tue, 2 Apr 2024 07:42:32 +0200
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
> R740.
>
> Again, the kernel panic was triggered by a BCM5720 running Broadcom
> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>
> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>
> Stefan
>
> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> [1325589.991223] {1}[Hardware Error]: event severity: fatal
> [1325589.991225] {1}[Hardware Error]:  Error 0, type: fatal
> [1325589.991227] {1}[Hardware Error]:   section_type: PCIe error
> [1325589.991228] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [1325589.991231] {1}[Hardware Error]:   version: 3.0
> [1325589.991233] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> [1325589.991235] {1}[Hardware Error]:   device_id: 0000:01:00.1
> [1325589.991237] {1}[Hardware Error]:   slot: 0
> [1325589.991239] {1}[Hardware Error]:   secondary_bus: 0x00
> [1325589.991240] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> [1325589.991242] {1}[Hardware Error]:   class_code: 020000
> [1325589.991244] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> [1325589.991246] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> [1325589.991248] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O
>  6.5.13-1-pve #1
> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
> 2.20.1 09/13/2023
> [1325589.991259] Call Trace:
> [1325589.991261]  <NMI>
>
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>
> root@per740:~# ethtool -i eno4
> driver: tg3
> version: 6.5.13-3-pve
> firmware-version: FFV22.71.3 bc 5720-v1.39
> expansion-rom-version:
> bus-info: 0000:01:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
>
>
> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com> wrote:
> >
> >
> > From: Stefan Radman <stefan.radman@me.com>
> > Subject: 6.5.13-3-pve kernel panic on shutdown
> > Date: March 28, 2024 at 15:50:02 GMT+1
> > To: PVE User List <pve-user@pve.proxmox.com>
> >
> >
> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
> >
> > The kernel panic is triggered 3-4 seconds after the last network
> interface goes down (onboard BCM5720 LOM), while the system enters S5
> (sleep) state.
> >
> > [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling slave
> > [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling slave
> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> > [84460.001615] bond0: now running without any active interface!
> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> >
> > This is reproducible on every reboot.
> >
> > R540 and BCM5720 are running the latest firmware available from the Dell
> support website.
> >
> > Link [2] below seem to suggest that my problem is related to a
> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
> >
> > Has anyone else seen this lately (or ever) with Promox VE?
> >
> > Thank you
> >
> > Stefan
> >
> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
> >
> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
> >
> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
> >
> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
> triggering AER
> >
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
> >
> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
> >
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
> >
> > [6] * [PATCH] tg3: add new module param to force device power down on
> reboot
> >
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
> >
> >
> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> > [84458.607141] systemd-shutdown[1]: Rebooting.
> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> > [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling slave
> > [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling slave
> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> > [84460.001615] bond0: now running without any active interface!
> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> > [84463.685116] {1}[Hardware Error]: event severity: fatal
> > [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> > [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> > [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> > [84463.685121] {1}[Hardware Error]:   version: 3.0
> > [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> > [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> > [84463.685125] {1}[Hardware Error]:   slot: 0
> > [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> > [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> > [84463.685128] {1}[Hardware Error]:   class_code: 020000
> > [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> > [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> > [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>  O       6.5.13-3-pve #1
> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
> 2.21.1 03/07/2024
> > [84463.685140] Call Trace:
> > [84463.685142]  <NMI>
> > …
> >
> > root@pve:~# pveversion
> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> > root@pve:~# ethtool -i eno2
> > driver: tg3
> > version: 6.5.13-3-pve
> > firmware-version: FFV22.71.3 bc 5720-v1.39
> > expansion-rom-version:
> > bus-info: 0000:04:00.1
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: no
> > root@pve:~# lspci | fgrep 04:00.1
> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> BCM5720 Gigabit Ethernet PCIe
> >
> >
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List <
> pve-user@pve.proxmox.com>
> Bcc:
> Date: Tue, 2 Apr 2024 07:42:32 +0200
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
@ 2024-04-02 11:15           ` Gilberto Ferreira
  0 siblings, 0 replies; 4+ messages in thread
From: Gilberto Ferreira @ 2024-04-02 11:15 UTC (permalink / raw)
  To: Stefan Radman; +Cc: PVE User List

Good

Em ter., 2 de abr. de 2024, 08:09, Stefan Radman <stefan.radman@me.com>
escreveu:

> Workaround: No more kernel panics on reboot when pinning kernel
> 6.2.16-20-pve.
>
> Affected kernels:
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> The original issue [1] was solved long ago [2] but apparently
> re-introduced recently [3].
>
> Regression [4] being discussed on kernel.org
>
> Looks like a back and forth in the tg3 driver.
>
> Note that the kernel panic is only triggered by “reboot” and not by
> “shutdown”.
>
> Stefan
>
> root@per740:~# proxmox-boot-tool kernel list
> Manually selected kernels:
> None.
>
> Automatically selected kernels:
> 6.2.16-20-pve
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> Pinned kernel:
> 6.2.16-20-pve
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.2.16-20-pve)
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] tg3: Disable tg3 device on system reboot to avoid triggering AER
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> [3] tg3: power down device only on SYSTEM_POWER_OFF
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9fc3bc7643341dc5be7d269f3d3dbe441d8d7ac3
>
> [4] * [PATCH] tg3: add new module param to force device power down on
> reboot
>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>
>
> On Apr 2, 2024, at 09:37, Gilberto Ferreira <gilberto.nunes32@gmail.com>
> wrote:
>
> Perhaps you should try another kernel besides 6.15 like 6.2 for instance.
>
> Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com> escreveu:
>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman <stefan.radman@me.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: PVE User List <pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
>> R740.
>>
>> Again, the kernel panic was triggered by a BCM5720 running Broadcom
>> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>>
>> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>>
>> Stefan
>>
>> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
>> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> [1325589.991223] {1}[Hardware Error]: event severity: fatal
>> [1325589.991225] {1}[Hardware Error]:  Error 0, type: fatal
>> [1325589.991227] {1}[Hardware Error]:   section_type: PCIe error
>> [1325589.991228] {1}[Hardware Error]:   port_type: 0, PCIe end point
>> [1325589.991231] {1}[Hardware Error]:   version: 3.0
>> [1325589.991233] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
>> [1325589.991235] {1}[Hardware Error]:   device_id: 0000:01:00.1
>> [1325589.991237] {1}[Hardware Error]:   slot: 0
>> [1325589.991239] {1}[Hardware Error]:   secondary_bus: 0x00
>> [1325589.991240] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
>> 0x165f
>> [1325589.991242] {1}[Hardware Error]:   class_code: 020000
>> [1325589.991244] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> [1325589.991246] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
>> [1325589.991248] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
>> 90028090 00000000
>> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
>> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O
>>    6.5.13-1-pve #1
>> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
>> 2.20.1 09/13/2023
>> [1325589.991259] Call Trace:
>> [1325589.991261]  <NMI>
>>
>> root@per740:~# pveversion
>> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>>
>> root@per740:~# ethtool -i eno4
>> driver: tg3
>> version: 6.5.13-3-pve
>> firmware-version: FFV22.71.3 bc 5720-v1.39
>> expansion-rom-version:
>> bus-info: 0000:01:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>>
>>
>> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user <
>> pve-user@lists.proxmox.com> wrote:
>> >
>> >
>> > From: Stefan Radman <stefan.radman@me.com>
>> > Subject: 6.5.13-3-pve kernel panic on shutdown
>> > Date: March 28, 2024 at 15:50:02 GMT+1
>> > To: PVE User List <pve-user@pve.proxmox.com>
>> >
>> >
>> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
>> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>> >
>> > The kernel panic is triggered 3-4 seconds after the last network
>> interface goes down (onboard BCM5720 LOM), while the system enters S5
>> (sleep) state.
>> >
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> >
>> > This is reproducible on every reboot.
>> >
>> > R540 and BCM5720 are running the latest firmware available from the
>> Dell support website.
>> >
>> > Link [2] below seem to suggest that my problem is related to a
>> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>> >
>> > Has anyone else seen this lately (or ever) with Promox VE?
>> >
>> > Thank you
>> >
>> > Stefan
>> >
>> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>> >
>> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot"
>> which causes Bus Fatal Error when rebooting system with BCM5720 NIC
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>> >
>> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>> >
>> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
>> triggering AER
>> >
>> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
>> >
>> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>> >
>> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
>> >
>> > [6] * [PATCH] tg3: add new module param to force device power down on
>> reboot
>> >
>> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>> >
>> >
>> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
>> devices.
>> > [84458.607141] systemd-shutdown[1]: Rebooting.
>> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
>> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion
>> is called outbound_intr_mask:0x40000009
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> > [84463.685116] {1}[Hardware Error]: event severity: fatal
>> > [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
>> > [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
>> > [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
>> > [84463.685121] {1}[Hardware Error]:   version: 3.0
>> > [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
>> > [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
>> > [84463.685125] {1}[Hardware Error]:   slot: 0
>> > [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
>> > [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
>> 0x165f
>> > [84463.685128] {1}[Hardware Error]:   class_code: 020000
>> > [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> > [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
>> > [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
>> 90028090 00000000
>> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
>> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>>  O       6.5.13-3-pve #1
>> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
>> 2.21.1 03/07/2024
>> > [84463.685140] Call Trace:
>> > [84463.685142]  <NMI>
>> > …
>> >
>> > root@pve:~# pveversion
>> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
>> > root@pve:~# ethtool -i eno2
>> > driver: tg3
>> > version: 6.5.13-3-pve
>> > firmware-version: FFV22.71.3 bc 5720-v1.39
>> > expansion-rom-version:
>> > bus-info: 0000:04:00.1
>> > supports-statistics: yes
>> > supports-test: yes
>> > supports-eeprom-access: yes
>> > supports-register-dump: yes
>> > supports-priv-flags: no
>> > root@pve:~# lspci | fgrep 04:00.1
>> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>> BCM5720 Gigabit Ethernet PCIe
>> >
>> >
>> >
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-user@lists.proxmox.com
>> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List <
>> pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-02 11:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D8D305A2-D2B7-4A5D-821C-65DE75621457@kmi.com>
     [not found] ` <93280CB8-7582-4456-9101-D594CE2C86A2@kmi.com>
     [not found]   ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com>
2024-03-28 15:18     ` [PVE-User] 6.5.13-3-pve kernel panic on shutdown Gilberto Ferreira
     [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
2024-03-28 15:57         ` Gilberto Ferreira
     [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
2024-04-02  7:37       ` Gilberto Ferreira
     [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
2024-04-02 11:15           ` Gilberto Ferreira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal