all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]   ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com>
@ 2024-03-28 15:18     ` Gilberto Ferreira
       [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
       [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Gilberto Ferreira @ 2024-03-28 15:18 UTC (permalink / raw)
  To: Proxmox VE user list

Try to update the server firmware.
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: PVE User List <pve-user@pve.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 28 Mar 2024 15:50:02 +0100
> Subject: 6.5.13-3-pve kernel panic on shutdown
> I recently noticed that a Dell Poweredge R540 currently running Proxmox VE
> 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>
> The kernel panic is triggered 3-4 seconds after the last network interface
> goes down (onboard BCM5720 LOM), while the system enters S5 (sleep) state.
>
> [84459.970212] bond0: (slave eno1): link status definitely down, disabling
> slave
> [84459.982170] bond0: (slave eno2): link status definitely down, disabling
> slave
> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> [84460.001615] bond0: now running without any active interface!
> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
>
> This is reproducible on every reboot.
>
> R540 and BCM5720 are running the latest firmware available from the Dell
> support website.
>
> Link [2] below seem to suggest that my problem is related to a combination
> of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>
> Has anyone else seen this lately (or ever) with Promox VE?
>
> Thank you
>
> Stefan
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>
> [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>
> [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid triggering
> AER
>
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
>
> [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
>
> [6] * [PATCH] tg3: add new module param to force device power down on
> reboot
>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>
>
> [84458.600189] systemd-shutdown[1]: Syncing filesystems and block devices.
> [84458.607141] systemd-shutdown[1]: Rebooting.
> [84458.612283] spi-nor spi0.0: Software reset failed: -524
> [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> [84459.970212] bond0: (slave eno1): link status definitely down, disabling
> slave
> [84459.982170] bond0: (slave eno2): link status definitely down, disabling
> slave
> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> [84460.001615] bond0: now running without any active interface!
> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> [84463.685116] {1}[Hardware Error]: event severity: fatal
> [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [84463.685121] {1}[Hardware Error]:   version: 3.0
> [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> [84463.685125] {1}[Hardware Error]:   slot: 0
> [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
> [84463.685128] {1}[Hardware Error]:   class_code: 020000
> [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P           O
>      6.5.13-3-pve #1
> [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS 2.21.1
> 03/07/2024
> [84463.685140] Call Trace:
> [84463.685142]  <NMI>
> …
>
> root@pve:~# pveversion
> pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> root@pve:~# ethtool -i eno2
> driver: tg3
> version: 6.5.13-3-pve
> firmware-version: FFV22.71.3 bc 5720-v1.39
> expansion-rom-version:
> bus-info: 0000:04:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
> root@pve:~# lspci | fgrep 04:00.1
> 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> BCM5720 Gigabit Ethernet PCIe
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: PVE User List <pve-user@pve.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>
> Bcc:
> Date: Thu, 28 Mar 2024 15:50:02 +0100
> Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
@ 2024-03-28 15:57         ` Gilberto Ferreira
  0 siblings, 0 replies; 4+ messages in thread
From: Gilberto Ferreira @ 2024-03-28 15:57 UTC (permalink / raw)
  To: Proxmox VE user list

https://medium.com/@nothanjack/dealing-with-apei-generic-hardware-error-source-problems-in-linux-a8ee8a67c8c1
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram






Em qui., 28 de mar. de 2024 às 12:54, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 28 Mar 2024 16:47:43 +0100
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> Hi Gilberto
>
> The server firmware is up to date.
>
> Stefan
>
> > On Mar 28, 2024, at 16:18, Gilberto Ferreira <gilberto.nunes32@gmail.com>
> wrote:
> >
> > Try to update the server firmware.
> > ---
> > Gilberto Nunes Ferreira
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> >
> >
> >
> >
> >
> > Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user <
> > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>>
> escreveu:
> >
> >>
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com
> >>
> >> To: PVE User List <pve-user@pve.proxmox.com <mailto:
> pve-user@pve.proxmox.com>>
> >> Cc:
> >> Bcc:
> >> Date: Thu, 28 Mar 2024 15:50:02 +0100
> >> Subject: 6.5.13-3-pve kernel panic on shutdown
> >> I recently noticed that a Dell Poweredge R540 currently running Proxmox
> VE
> >> 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
> >>
> >> The kernel panic is triggered 3-4 seconds after the last network
> interface
> >> goes down (onboard BCM5720 LOM), while the system enters S5 (sleep)
> state.
> >>
> >> [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling
> >> slave
> >> [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling
> >> slave
> >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> >> [84460.001615] bond0: now running without any active interface!
> >> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> >> Hardware Error Source: 5
> >>
> >> This is reproducible on every reboot.
> >>
> >> R540 and BCM5720 are running the latest firmware available from the Dell
> >> support website.
> >>
> >> Link [2] below seem to suggest that my problem is related to a
> combination
> >> of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
> >>
> >> Has anyone else seen this lately (or ever) with Promox VE?
> >>
> >> Thank you
> >>
> >> Stefan
> >>
> >> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
> >>
> >> [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> >> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
> >>
> >> [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
> >>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
> >>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
> >>
> >> [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
> triggering
> >> AER
> >>
> >>
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
> >>
> >> [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
> >>
> >>
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
> >>
> >> [6] * [PATCH] tg3: add new module param to force device power down on
> >> reboot
> >>
> >>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
> >>
> >>
> >> [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> >> [84458.607141] systemd-shutdown[1]: Rebooting.
> >> [84458.612283] spi-nor spi0.0: Software reset failed: -524
> >> [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> >> called outbound_intr_mask:0x40000009
> >> [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling
> >> slave
> >> [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling
> >> slave
> >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> >> [84460.001615] bond0: now running without any active interface!
> >> [84460.018133] vmbr0: port 1(bond0) entered disabled state
> >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> >> Hardware Error Source: 5
> >> [84463.685116] {1}[Hardware Error]: event severity: fatal
> >> [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> >> [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> >> [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> >> [84463.685121] {1}[Hardware Error]:   version: 3.0
> >> [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> >> [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> >> [84463.685125] {1}[Hardware Error]:   slot: 0
> >> [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> >> [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> >> [84463.685128] {1}[Hardware Error]:   class_code: 020000
> >> [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> >> aer_uncor_mask: 0x00010000
> >> [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> >> [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> >> 90028090 00000000
> >> [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> >> [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>  O
> >>     6.5.13-3-pve #1
> >> [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
> 2.21.1
> >> 03/07/2024
> >> [84463.685140] Call Trace:
> >> [84463.685142]  <NMI>
> >> …
> >>
> >> root@pve:~# pveversion
> >> pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> >> root@pve:~# ethtool -i eno2
> >> driver: tg3
> >> version: 6.5.13-3-pve
> >> firmware-version: FFV22.71.3 bc 5720-v1.39
> >> expansion-rom-version:
> >> bus-info: 0000:04:00.1
> >> supports-statistics: yes
> >> supports-test: yes
> >> supports-eeprom-access: yes
> >> supports-register-dump: yes
> >> supports-priv-flags: no
> >> root@pve:~# lspci | fgrep 04:00.1
> >> 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> >> BCM5720 Gigabit Ethernet PCIe
> >>
> >>
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com <mailto:
> pve-user@lists.proxmox.com>>
> >> To: PVE User List <pve-user@pve.proxmox.com <mailto:
> pve-user@pve.proxmox.com>>
> >> Cc: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com>>
> >> Bcc:
> >> Date: Thu, 28 Mar 2024 15:50:02 +0100
> >> Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>
> Bcc:
> Date: Thu, 28 Mar 2024 16:47:43 +0100
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
@ 2024-04-02  7:37       ` Gilberto Ferreira
       [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Gilberto Ferreira @ 2024-04-02  7:37 UTC (permalink / raw)
  To: Proxmox VE user list; +Cc: Stefan Radman, PVE User List

Perhaps you should try another kernel besides 6.15 like 6.2 for instance.

Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user <
pve-user@lists.proxmox.com> escreveu:

>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman <stefan.radman@me.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: PVE User List <pve-user@pve.proxmox.com>
> Bcc:
> Date: Tue, 2 Apr 2024 07:42:32 +0200
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
> R740.
>
> Again, the kernel panic was triggered by a BCM5720 running Broadcom
> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>
> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>
> Stefan
>
> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> [1325589.991223] {1}[Hardware Error]: event severity: fatal
> [1325589.991225] {1}[Hardware Error]:  Error 0, type: fatal
> [1325589.991227] {1}[Hardware Error]:   section_type: PCIe error
> [1325589.991228] {1}[Hardware Error]:   port_type: 0, PCIe end point
> [1325589.991231] {1}[Hardware Error]:   version: 3.0
> [1325589.991233] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> [1325589.991235] {1}[Hardware Error]:   device_id: 0000:01:00.1
> [1325589.991237] {1}[Hardware Error]:   slot: 0
> [1325589.991239] {1}[Hardware Error]:   secondary_bus: 0x00
> [1325589.991240] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> [1325589.991242] {1}[Hardware Error]:   class_code: 020000
> [1325589.991244] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> [1325589.991246] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> [1325589.991248] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O
>  6.5.13-1-pve #1
> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
> 2.20.1 09/13/2023
> [1325589.991259] Call Trace:
> [1325589.991261]  <NMI>
>
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>
> root@per740:~# ethtool -i eno4
> driver: tg3
> version: 6.5.13-3-pve
> firmware-version: FFV22.71.3 bc 5720-v1.39
> expansion-rom-version:
> bus-info: 0000:01:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
>
>
> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com> wrote:
> >
> >
> > From: Stefan Radman <stefan.radman@me.com>
> > Subject: 6.5.13-3-pve kernel panic on shutdown
> > Date: March 28, 2024 at 15:50:02 GMT+1
> > To: PVE User List <pve-user@pve.proxmox.com>
> >
> >
> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
> >
> > The kernel panic is triggered 3-4 seconds after the last network
> interface goes down (onboard BCM5720 LOM), while the system enters S5
> (sleep) state.
> >
> > [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling slave
> > [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling slave
> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> > [84460.001615] bond0: now running without any active interface!
> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> >
> > This is reproducible on every reboot.
> >
> > R540 and BCM5720 are running the latest firmware available from the Dell
> support website.
> >
> > Link [2] below seem to suggest that my problem is related to a
> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
> >
> > Has anyone else seen this lately (or ever) with Promox VE?
> >
> > Thank you
> >
> > Stefan
> >
> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
> >
> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which
> causes Bus Fatal Error when rebooting system with BCM5720 NIC
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
> >
> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
> >
> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
> triggering AER
> >
> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
> >
> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
> >
> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
> >
> > [6] * [PATCH] tg3: add new module param to force device power down on
> reboot
> >
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
> >
> >
> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
> devices.
> > [84458.607141] systemd-shutdown[1]: Rebooting.
> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> > [84459.970212] bond0: (slave eno1): link status definitely down,
> disabling slave
> > [84459.982170] bond0: (slave eno2): link status definitely down,
> disabling slave
> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
> > [84460.001615] bond0: now running without any active interface!
> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 5
> > [84463.685116] {1}[Hardware Error]: event severity: fatal
> > [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
> > [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
> > [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
> > [84463.685121] {1}[Hardware Error]:   version: 3.0
> > [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
> > [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
> > [84463.685125] {1}[Hardware Error]:   slot: 0
> > [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
> > [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
> 0x165f
> > [84463.685128] {1}[Hardware Error]:   class_code: 020000
> > [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
> aer_uncor_mask: 0x00010000
> > [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
> > [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
> 90028090 00000000
> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>  O       6.5.13-3-pve #1
> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
> 2.21.1 03/07/2024
> > [84463.685140] Call Trace:
> > [84463.685142]  <NMI>
> > …
> >
> > root@pve:~# pveversion
> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
> > root@pve:~# ethtool -i eno2
> > driver: tg3
> > version: 6.5.13-3-pve
> > firmware-version: FFV22.71.3 bc 5720-v1.39
> > expansion-rom-version:
> > bus-info: 0000:04:00.1
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: no
> > root@pve:~# lspci | fgrep 04:00.1
> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> BCM5720 Gigabit Ethernet PCIe
> >
> >
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
> ---------- Forwarded message ----------
> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List <
> pve-user@pve.proxmox.com>
> Bcc:
> Date: Tue, 2 Apr 2024 07:42:32 +0200
> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
       [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
@ 2024-04-02 11:15           ` Gilberto Ferreira
  0 siblings, 0 replies; 4+ messages in thread
From: Gilberto Ferreira @ 2024-04-02 11:15 UTC (permalink / raw)
  To: Stefan Radman; +Cc: PVE User List

Good

Em ter., 2 de abr. de 2024, 08:09, Stefan Radman <stefan.radman@me.com>
escreveu:

> Workaround: No more kernel panics on reboot when pinning kernel
> 6.2.16-20-pve.
>
> Affected kernels:
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> The original issue [1] was solved long ago [2] but apparently
> re-introduced recently [3].
>
> Regression [4] being discussed on kernel.org
>
> Looks like a back and forth in the tg3 driver.
>
> Note that the kernel panic is only triggered by “reboot” and not by
> “shutdown”.
>
> Stefan
>
> root@per740:~# proxmox-boot-tool kernel list
> Manually selected kernels:
> None.
>
> Automatically selected kernels:
> 6.2.16-20-pve
> 6.5.13-1-pve
> 6.5.13-3-pve
>
> Pinned kernel:
> 6.2.16-20-pve
> root@per740:~# pveversion
> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.2.16-20-pve)
>
> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>
> [2] tg3: Disable tg3 device on system reboot to avoid triggering AER
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>
> [3] tg3: power down device only on SYSTEM_POWER_OFF
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9fc3bc7643341dc5be7d269f3d3dbe441d8d7ac3
>
> [4] * [PATCH] tg3: add new module param to force device power down on
> reboot
>
> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>
>
> On Apr 2, 2024, at 09:37, Gilberto Ferreira <gilberto.nunes32@gmail.com>
> wrote:
>
> Perhaps you should try another kernel besides 6.15 like 6.2 for instance.
>
> Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user <
> pve-user@lists.proxmox.com> escreveu:
>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman <stefan.radman@me.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: PVE User List <pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> Yesterday I had the same thing happen when shutting down a Dell PowerEdge
>> R740.
>>
>> Again, the kernel panic was triggered by a BCM5720 running Broadcom
>> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve.
>>
>> R740 BIOS 2.21.2 (but also happened with 2.20.1)
>>
>> Stefan
>>
>> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5
>> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> [1325589.991223] {1}[Hardware Error]: event severity: fatal
>> [1325589.991225] {1}[Hardware Error]:  Error 0, type: fatal
>> [1325589.991227] {1}[Hardware Error]:   section_type: PCIe error
>> [1325589.991228] {1}[Hardware Error]:   port_type: 0, PCIe end point
>> [1325589.991231] {1}[Hardware Error]:   version: 3.0
>> [1325589.991233] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
>> [1325589.991235] {1}[Hardware Error]:   device_id: 0000:01:00.1
>> [1325589.991237] {1}[Hardware Error]:   slot: 0
>> [1325589.991239] {1}[Hardware Error]:   secondary_bus: 0x00
>> [1325589.991240] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
>> 0x165f
>> [1325589.991242] {1}[Hardware Error]:   class_code: 020000
>> [1325589.991244] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> [1325589.991246] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
>> [1325589.991248] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
>> 90028090 00000000
>> [1325589.991252] Kernel panic - not syncing: Fatal hardware error!
>> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O
>>    6.5.13-1-pve #1
>> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS
>> 2.20.1 09/13/2023
>> [1325589.991259] Call Trace:
>> [1325589.991261]  <NMI>
>>
>> root@per740:~# pveversion
>> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve)
>>
>> root@per740:~# ethtool -i eno4
>> driver: tg3
>> version: 6.5.13-3-pve
>> firmware-version: FFV22.71.3 bc 5720-v1.39
>> expansion-rom-version:
>> bus-info: 0000:01:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>>
>>
>> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user <
>> pve-user@lists.proxmox.com> wrote:
>> >
>> >
>> > From: Stefan Radman <stefan.radman@me.com>
>> > Subject: 6.5.13-3-pve kernel panic on shutdown
>> > Date: March 28, 2024 at 15:50:02 GMT+1
>> > To: PVE User List <pve-user@pve.proxmox.com>
>> >
>> >
>> > I recently noticed that a Dell Poweredge R540 currently running Proxmox
>> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown.
>> >
>> > The kernel panic is triggered 3-4 seconds after the last network
>> interface goes down (onboard BCM5720 LOM), while the system enters S5
>> (sleep) state.
>> >
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> >
>> > This is reproducible on every reboot.
>> >
>> > R540 and BCM5720 are running the latest firmware available from the
>> Dell support website.
>> >
>> > Link [2] below seem to suggest that my problem is related to a
>> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC.
>> >
>> > Has anyone else seen this lately (or ever) with Promox VE?
>> >
>> > Thank you
>> >
>> > Stefan
>> >
>> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730
>> >
>> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot"
>> which causes Bus Fatal Error when rebooting system with BCM5720 NIC
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471
>> >
>> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
>> >
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074
>> >
>> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid
>> triggering AER
>> >
>> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/
>> >
>> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot
>> >
>> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/
>> >
>> > [6] * [PATCH] tg3: add new module param to force device power down on
>> reboot
>> >
>> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/
>> >
>> >
>> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block
>> devices.
>> > [84458.607141] systemd-shutdown[1]: Rebooting.
>> > [84458.612283] spi-nor spi0.0: Software reset failed: -524
>> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion
>> is called outbound_intr_mask:0x40000009
>> > [84459.970212] bond0: (slave eno1): link status definitely down,
>> disabling slave
>> > [84459.982170] bond0: (slave eno2): link status definitely down,
>> disabling slave
>> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode
>> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode
>> > [84460.001615] bond0: now running without any active interface!
>> > [84460.018133] vmbr0: port 1(bond0) entered disabled state
>> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5
>> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic
>> Hardware Error Source: 5
>> > [84463.685116] {1}[Hardware Error]: event severity: fatal
>> > [84463.685117] {1}[Hardware Error]:  Error 0, type: fatal
>> > [84463.685119] {1}[Hardware Error]:   section_type: PCIe error
>> > [84463.685120] {1}[Hardware Error]:   port_type: 0, PCIe end point
>> > [84463.685121] {1}[Hardware Error]:   version: 3.0
>> > [84463.685122] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
>> > [84463.685123] {1}[Hardware Error]:   device_id: 0000:04:00.1
>> > [84463.685125] {1}[Hardware Error]:   slot: 0
>> > [84463.685126] {1}[Hardware Error]:   secondary_bus: 0x00
>> > [84463.685127] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id:
>> 0x165f
>> > [84463.685128] {1}[Hardware Error]:   class_code: 020000
>> > [84463.685129] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
>> aer_uncor_mask: 0x00010000
>> > [84463.685130] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
>> > [84463.685131] {1}[Hardware Error]:   TLP Header: 40000001 0000010f
>> 90028090 00000000
>> > [84463.685134] Kernel panic - not syncing: Fatal hardware error!
>> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P
>>  O       6.5.13-3-pve #1
>> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS
>> 2.21.1 03/07/2024
>> > [84463.685140] Call Trace:
>> > [84463.685142]  <NMI>
>> > …
>> >
>> > root@pve:~# pveversion
>> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve)
>> > root@pve:~# ethtool -i eno2
>> > driver: tg3
>> > version: 6.5.13-3-pve
>> > firmware-version: FFV22.71.3 bc 5720-v1.39
>> > expansion-rom-version:
>> > bus-info: 0000:04:00.1
>> > supports-statistics: yes
>> > supports-test: yes
>> > supports-eeprom-access: yes
>> > supports-register-dump: yes
>> > supports-priv-flags: no
>> > root@pve:~# lspci | fgrep 04:00.1
>> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>> BCM5720 Gigabit Ethernet PCIe
>> >
>> >
>> >
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-user@lists.proxmox.com
>> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com>
>> To: Proxmox VE user list <pve-user@lists.proxmox.com>
>> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List <
>> pve-user@pve.proxmox.com>
>> Bcc:
>> Date: Tue, 2 Apr 2024 07:42:32 +0200
>> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-02 11:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D8D305A2-D2B7-4A5D-821C-65DE75621457@kmi.com>
     [not found] ` <93280CB8-7582-4456-9101-D594CE2C86A2@kmi.com>
     [not found]   ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com>
2024-03-28 15:18     ` [PVE-User] 6.5.13-3-pve kernel panic on shutdown Gilberto Ferreira
     [not found]       ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com>
2024-03-28 15:57         ` Gilberto Ferreira
     [not found]     ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com>
2024-04-02  7:37       ` Gilberto Ferreira
     [not found]         ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>
2024-04-02 11:15           ` Gilberto Ferreira

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal