* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown [not found] ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com> @ 2024-03-28 15:18 ` Gilberto Ferreira [not found] ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com> [not found] ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com> 1 sibling, 1 reply; 4+ messages in thread From: Gilberto Ferreira @ 2024-03-28 15:18 UTC (permalink / raw) To: Proxmox VE user list Try to update the server firmware. --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user < pve-user@lists.proxmox.com> escreveu: > > > > ---------- Forwarded message ---------- > From: Stefan Radman <stefan.radman@me.com> > To: PVE User List <pve-user@pve.proxmox.com> > Cc: > Bcc: > Date: Thu, 28 Mar 2024 15:50:02 +0100 > Subject: 6.5.13-3-pve kernel panic on shutdown > I recently noticed that a Dell Poweredge R540 currently running Proxmox VE > 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown. > > The kernel panic is triggered 3-4 seconds after the last network interface > goes down (onboard BCM5720 LOM), while the system enters S5 (sleep) state. > > [84459.970212] bond0: (slave eno1): link status definitely down, disabling > slave > [84459.982170] bond0: (slave eno2): link status definitely down, disabling > slave > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > [84460.001615] bond0: now running without any active interface! > [84460.018133] vmbr0: port 1(bond0) entered disabled state > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 5 > > This is reproducible on every reboot. > > R540 and BCM5720 are running the latest firmware available from the Dell > support website. > > Link [2] below seem to suggest that my problem is related to a combination > of ACPI S5, the tg3 driver and the BCM5720 on-board NIC. > > Has anyone else seen this lately (or ever) with Promox VE? > > Thank you > > Stefan > > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730 > > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which > causes Bus Fatal Error when rebooting system with BCM5720 NIC > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471 > > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074 > > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid triggering > AER > > https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/ > > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot > > https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/ > > [6] * [PATCH] tg3: add new module param to force device power down on > reboot > > https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/ > > > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block devices. > [84458.607141] systemd-shutdown[1]: Rebooting. > [84458.612283] spi-nor spi0.0: Software reset failed: -524 > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x40000009 > [84459.970212] bond0: (slave eno1): link status definitely down, disabling > slave > [84459.982170] bond0: (slave eno2): link status definitely down, disabling > slave > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > [84460.001615] bond0: now running without any active interface! > [84460.018133] vmbr0: port 1(bond0) entered disabled state > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 5 > [84463.685116] {1}[Hardware Error]: event severity: fatal > [84463.685117] {1}[Hardware Error]: Error 0, type: fatal > [84463.685119] {1}[Hardware Error]: section_type: PCIe error > [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point > [84463.685121] {1}[Hardware Error]: version: 3.0 > [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010 > [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1 > [84463.685125] {1}[Hardware Error]: slot: 0 > [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00 > [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f > [84463.685128] {1}[Hardware Error]: class_code: 020000 > [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000, > aer_uncor_mask: 0x00010000 > [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 > [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f > 90028090 00000000 > [84463.685134] Kernel panic - not syncing: Fatal hardware error! > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P O > 6.5.13-3-pve #1 > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS 2.21.1 > 03/07/2024 > [84463.685140] Call Trace: > [84463.685142] <NMI> > … > > root@pve:~# pveversion > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve) > root@pve:~# ethtool -i eno2 > driver: tg3 > version: 6.5.13-3-pve > firmware-version: FFV22.71.3 bc 5720-v1.39 > expansion-rom-version: > bus-info: 0000:04:00.1 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: no > root@pve:~# lspci | fgrep 04:00.1 > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme > BCM5720 Gigabit Ethernet PCIe > > > > > ---------- Forwarded message ---------- > From: Stefan Radman via pve-user <pve-user@lists.proxmox.com> > To: PVE User List <pve-user@pve.proxmox.com> > Cc: Stefan Radman <stefan.radman@me.com> > Bcc: > Date: Thu, 28 Mar 2024 15:50:02 +0100 > Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <mailman.761.1711641292.434.pve-user@lists.proxmox.com>]
* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown [not found] ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com> @ 2024-03-28 15:57 ` Gilberto Ferreira 0 siblings, 0 replies; 4+ messages in thread From: Gilberto Ferreira @ 2024-03-28 15:57 UTC (permalink / raw) To: Proxmox VE user list https://medium.com/@nothanjack/dealing-with-apei-generic-hardware-error-source-problems-in-linux-a8ee8a67c8c1 --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qui., 28 de mar. de 2024 às 12:54, Stefan Radman via pve-user < pve-user@lists.proxmox.com> escreveu: > > > > ---------- Forwarded message ---------- > From: Stefan Radman <stefan.radman@me.com> > To: Proxmox VE user list <pve-user@lists.proxmox.com> > Cc: > Bcc: > Date: Thu, 28 Mar 2024 16:47:43 +0100 > Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > Hi Gilberto > > The server firmware is up to date. > > Stefan > > > On Mar 28, 2024, at 16:18, Gilberto Ferreira <gilberto.nunes32@gmail.com> > wrote: > > > > Try to update the server firmware. > > --- > > Gilberto Nunes Ferreira > > (47) 99676-7530 - Whatsapp / Telegram > > > > > > > > > > > > > > Em qui., 28 de mar. de 2024 às 11:58, Stefan Radman via pve-user < > > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com>> > escreveu: > > > >> > >> > >> > >> ---------- Forwarded message ---------- > >> From: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com > >> > >> To: PVE User List <pve-user@pve.proxmox.com <mailto: > pve-user@pve.proxmox.com>> > >> Cc: > >> Bcc: > >> Date: Thu, 28 Mar 2024 15:50:02 +0100 > >> Subject: 6.5.13-3-pve kernel panic on shutdown > >> I recently noticed that a Dell Poweredge R540 currently running Proxmox > VE > >> 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown. > >> > >> The kernel panic is triggered 3-4 seconds after the last network > interface > >> goes down (onboard BCM5720 LOM), while the system enters S5 (sleep) > state. > >> > >> [84459.970212] bond0: (slave eno1): link status definitely down, > disabling > >> slave > >> [84459.982170] bond0: (slave eno2): link status definitely down, > disabling > >> slave > >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > >> [84460.001615] bond0: now running without any active interface! > >> [84460.018133] vmbr0: port 1(bond0) entered disabled state > >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > >> Hardware Error Source: 5 > >> > >> This is reproducible on every reboot. > >> > >> R540 and BCM5720 are running the latest firmware available from the Dell > >> support website. > >> > >> Link [2] below seem to suggest that my problem is related to a > combination > >> of ACPI S5, the tg3 driver and the BCM5720 on-board NIC. > >> > >> Has anyone else seen this lately (or ever) with Promox VE? > >> > >> Thank you > >> > >> Stefan > >> > >> [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440 > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730 > >> > >> [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which > >> causes Bus Fatal Error when rebooting system with BCM5720 NIC > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471 > >> > >> [3] tg3: Disable tg3 device on system reboot to avoid triggering AER > >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca > >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074 > >> > >> [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid > triggering > >> AER > >> > >> > https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/ > >> > >> [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot > >> > >> > https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/ > >> > >> [6] * [PATCH] tg3: add new module param to force device power down on > >> reboot > >> > >> > https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/ > >> > >> > >> [84458.600189] systemd-shutdown[1]: Syncing filesystems and block > devices. > >> [84458.607141] systemd-shutdown[1]: Rebooting. > >> [84458.612283] spi-nor spi0.0: Software reset failed: -524 > >> [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is > >> called outbound_intr_mask:0x40000009 > >> [84459.970212] bond0: (slave eno1): link status definitely down, > disabling > >> slave > >> [84459.982170] bond0: (slave eno2): link status definitely down, > disabling > >> slave > >> [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > >> [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > >> [84460.001615] bond0: now running without any active interface! > >> [84460.018133] vmbr0: port 1(bond0) entered disabled state > >> [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > >> [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > >> Hardware Error Source: 5 > >> [84463.685116] {1}[Hardware Error]: event severity: fatal > >> [84463.685117] {1}[Hardware Error]: Error 0, type: fatal > >> [84463.685119] {1}[Hardware Error]: section_type: PCIe error > >> [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point > >> [84463.685121] {1}[Hardware Error]: version: 3.0 > >> [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010 > >> [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1 > >> [84463.685125] {1}[Hardware Error]: slot: 0 > >> [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00 > >> [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: > 0x165f > >> [84463.685128] {1}[Hardware Error]: class_code: 020000 > >> [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000, > >> aer_uncor_mask: 0x00010000 > >> [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 > >> [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f > >> 90028090 00000000 > >> [84463.685134] Kernel panic - not syncing: Fatal hardware error! > >> [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P > O > >> 6.5.13-3-pve #1 > >> [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS > 2.21.1 > >> 03/07/2024 > >> [84463.685140] Call Trace: > >> [84463.685142] <NMI> > >> … > >> > >> root@pve:~# pveversion > >> pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve) > >> root@pve:~# ethtool -i eno2 > >> driver: tg3 > >> version: 6.5.13-3-pve > >> firmware-version: FFV22.71.3 bc 5720-v1.39 > >> expansion-rom-version: > >> bus-info: 0000:04:00.1 > >> supports-statistics: yes > >> supports-test: yes > >> supports-eeprom-access: yes > >> supports-register-dump: yes > >> supports-priv-flags: no > >> root@pve:~# lspci | fgrep 04:00.1 > >> 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme > >> BCM5720 Gigabit Ethernet PCIe > >> > >> > >> > >> > >> ---------- Forwarded message ---------- > >> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com <mailto: > pve-user@lists.proxmox.com>> > >> To: PVE User List <pve-user@pve.proxmox.com <mailto: > pve-user@pve.proxmox.com>> > >> Cc: Stefan Radman <stefan.radman@me.com <mailto:stefan.radman@me.com>> > >> Bcc: > >> Date: Thu, 28 Mar 2024 15:50:02 +0100 > >> Subject: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > >> _______________________________________________ > >> pve-user mailing list > >> pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com> > >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > >> > > _______________________________________________ > > pve-user mailing list > > pve-user@lists.proxmox.com <mailto:pve-user@lists.proxmox.com> > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > ---------- Forwarded message ---------- > From: Stefan Radman via pve-user <pve-user@lists.proxmox.com> > To: Proxmox VE user list <pve-user@lists.proxmox.com> > Cc: Stefan Radman <stefan.radman@me.com> > Bcc: > Date: Thu, 28 Mar 2024 16:47:43 +0100 > Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <mailman.785.1712036605.434.pve-user@lists.proxmox.com>]
* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown [not found] ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com> @ 2024-04-02 7:37 ` Gilberto Ferreira [not found] ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com> 0 siblings, 1 reply; 4+ messages in thread From: Gilberto Ferreira @ 2024-04-02 7:37 UTC (permalink / raw) To: Proxmox VE user list; +Cc: Stefan Radman, PVE User List Perhaps you should try another kernel besides 6.15 like 6.2 for instance. Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user < pve-user@lists.proxmox.com> escreveu: > > > > ---------- Forwarded message ---------- > From: Stefan Radman <stefan.radman@me.com> > To: Proxmox VE user list <pve-user@lists.proxmox.com> > Cc: PVE User List <pve-user@pve.proxmox.com> > Bcc: > Date: Tue, 2 Apr 2024 07:42:32 +0200 > Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > Yesterday I had the same thing happen when shutting down a Dell PowerEdge > R740. > > Again, the kernel panic was triggered by a BCM5720 running Broadcom > firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve. > > R740 BIOS 2.21.2 (but also happened with 2.20.1) > > Stefan > > [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5 > [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 5 > [1325589.991223] {1}[Hardware Error]: event severity: fatal > [1325589.991225] {1}[Hardware Error]: Error 0, type: fatal > [1325589.991227] {1}[Hardware Error]: section_type: PCIe error > [1325589.991228] {1}[Hardware Error]: port_type: 0, PCIe end point > [1325589.991231] {1}[Hardware Error]: version: 3.0 > [1325589.991233] {1}[Hardware Error]: command: 0x0002, status: 0x0010 > [1325589.991235] {1}[Hardware Error]: device_id: 0000:01:00.1 > [1325589.991237] {1}[Hardware Error]: slot: 0 > [1325589.991239] {1}[Hardware Error]: secondary_bus: 0x00 > [1325589.991240] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: > 0x165f > [1325589.991242] {1}[Hardware Error]: class_code: 020000 > [1325589.991244] {1}[Hardware Error]: aer_uncor_status: 0x00100000, > aer_uncor_mask: 0x00010000 > [1325589.991246] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 > [1325589.991248] {1}[Hardware Error]: TLP Header: 40000001 0000010f > 90028090 00000000 > [1325589.991252] Kernel panic - not syncing: Fatal hardware error! > [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O > 6.5.13-1-pve #1 > [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS > 2.20.1 09/13/2023 > [1325589.991259] Call Trace: > [1325589.991261] <NMI> > > root@per740:~# pveversion > pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve) > > root@per740:~# ethtool -i eno4 > driver: tg3 > version: 6.5.13-3-pve > firmware-version: FFV22.71.3 bc 5720-v1.39 > expansion-rom-version: > bus-info: 0000:01:00.1 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: yes > supports-register-dump: yes > supports-priv-flags: no > > > > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user < > pve-user@lists.proxmox.com> wrote: > > > > > > From: Stefan Radman <stefan.radman@me.com> > > Subject: 6.5.13-3-pve kernel panic on shutdown > > Date: March 28, 2024 at 15:50:02 GMT+1 > > To: PVE User List <pve-user@pve.proxmox.com> > > > > > > I recently noticed that a Dell Poweredge R540 currently running Proxmox > VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown. > > > > The kernel panic is triggered 3-4 seconds after the last network > interface goes down (onboard BCM5720 LOM), while the system enters S5 > (sleep) state. > > > > [84459.970212] bond0: (slave eno1): link status definitely down, > disabling slave > > [84459.982170] bond0: (slave eno2): link status definitely down, > disabling slave > > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > > [84460.001615] bond0: now running without any active interface! > > [84460.018133] vmbr0: port 1(bond0) entered disabled state > > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 5 > > > > This is reproducible on every reboot. > > > > R540 and BCM5720 are running the latest firmware available from the Dell > support website. > > > > Link [2] below seem to suggest that my problem is related to a > combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC. > > > > Has anyone else seen this lately (or ever) with Promox VE? > > > > Thank you > > > > Stefan > > > > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440 > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730 > > > > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" which > causes Bus Fatal Error when rebooting system with BCM5720 NIC > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471 > > > > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074 > > > > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid > triggering AER > > > https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/ > > > > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot > > > https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/ > > > > [6] * [PATCH] tg3: add new module param to force device power down on > reboot > > > https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/ > > > > > > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block > devices. > > [84458.607141] systemd-shutdown[1]: Rebooting. > > [84458.612283] spi-nor spi0.0: Software reset failed: -524 > > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x40000009 > > [84459.970212] bond0: (slave eno1): link status definitely down, > disabling slave > > [84459.982170] bond0: (slave eno2): link status definitely down, > disabling slave > > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode > > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode > > [84460.001615] bond0: now running without any active interface! > > [84460.018133] vmbr0: port 1(bond0) entered disabled state > > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 > > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 5 > > [84463.685116] {1}[Hardware Error]: event severity: fatal > > [84463.685117] {1}[Hardware Error]: Error 0, type: fatal > > [84463.685119] {1}[Hardware Error]: section_type: PCIe error > > [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point > > [84463.685121] {1}[Hardware Error]: version: 3.0 > > [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010 > > [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1 > > [84463.685125] {1}[Hardware Error]: slot: 0 > > [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00 > > [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: > 0x165f > > [84463.685128] {1}[Hardware Error]: class_code: 020000 > > [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000, > aer_uncor_mask: 0x00010000 > > [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 > > [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f > 90028090 00000000 > > [84463.685134] Kernel panic - not syncing: Fatal hardware error! > > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P > O 6.5.13-3-pve #1 > > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS > 2.21.1 03/07/2024 > > [84463.685140] Call Trace: > > [84463.685142] <NMI> > > … > > > > root@pve:~# pveversion > > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve) > > root@pve:~# ethtool -i eno2 > > driver: tg3 > > version: 6.5.13-3-pve > > firmware-version: FFV22.71.3 bc 5720-v1.39 > > expansion-rom-version: > > bus-info: 0000:04:00.1 > > supports-statistics: yes > > supports-test: yes > > supports-eeprom-access: yes > > supports-register-dump: yes > > supports-priv-flags: no > > root@pve:~# lspci | fgrep 04:00.1 > > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme > BCM5720 Gigabit Ethernet PCIe > > > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user@lists.proxmox.com > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > ---------- Forwarded message ---------- > From: Stefan Radman via pve-user <pve-user@lists.proxmox.com> > To: Proxmox VE user list <pve-user@lists.proxmox.com> > Cc: Stefan Radman <stefan.radman@me.com>, PVE User List < > pve-user@pve.proxmox.com> > Bcc: > Date: Tue, 2 Apr 2024 07:42:32 +0200 > Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com>]
* Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown [not found] ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com> @ 2024-04-02 11:15 ` Gilberto Ferreira 0 siblings, 0 replies; 4+ messages in thread From: Gilberto Ferreira @ 2024-04-02 11:15 UTC (permalink / raw) To: Stefan Radman; +Cc: PVE User List Good Em ter., 2 de abr. de 2024, 08:09, Stefan Radman <stefan.radman@me.com> escreveu: > Workaround: No more kernel panics on reboot when pinning kernel > 6.2.16-20-pve. > > Affected kernels: > 6.5.13-1-pve > 6.5.13-3-pve > > The original issue [1] was solved long ago [2] but apparently > re-introduced recently [3]. > > Regression [4] being discussed on kernel.org > > Looks like a back and forth in the tg3 driver. > > Note that the kernel panic is only triggered by “reboot” and not by > “shutdown”. > > Stefan > > root@per740:~# proxmox-boot-tool kernel list > Manually selected kernels: > None. > > Automatically selected kernels: > 6.2.16-20-pve > 6.5.13-1-pve > 6.5.13-3-pve > > Pinned kernel: > 6.2.16-20-pve > root@per740:~# pveversion > pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.2.16-20-pve) > > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440 > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730 > > [2] tg3: Disable tg3 device on system reboot to avoid triggering AER > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca > > [3] tg3: power down device only on SYSTEM_POWER_OFF > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9fc3bc7643341dc5be7d269f3d3dbe441d8d7ac3 > > [4] * [PATCH] tg3: add new module param to force device power down on > reboot > > https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/ > > > On Apr 2, 2024, at 09:37, Gilberto Ferreira <gilberto.nunes32@gmail.com> > wrote: > > Perhaps you should try another kernel besides 6.15 like 6.2 for instance. > > Em ter., 2 de abr. de 2024, 02:43, Stefan Radman via pve-user < > pve-user@lists.proxmox.com> escreveu: > >> >> >> >> ---------- Forwarded message ---------- >> From: Stefan Radman <stefan.radman@me.com> >> To: Proxmox VE user list <pve-user@lists.proxmox.com> >> Cc: PVE User List <pve-user@pve.proxmox.com> >> Bcc: >> Date: Tue, 2 Apr 2024 07:42:32 +0200 >> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown >> Yesterday I had the same thing happen when shutting down a Dell PowerEdge >> R740. >> >> Again, the kernel panic was triggered by a BCM5720 running Broadcom >> firmware 22.71.3 and the tg3 driver from kernel 6.5.13-3-pve. >> >> R740 BIOS 2.21.2 (but also happened with 2.20.1) >> >> Stefan >> >> [1325586.715465] ACPI: PM: Preparing to enter system sleep state S5 >> [1325589.991219] {1}[Hardware Error]: Hardware error from APEI Generic >> Hardware Error Source: 5 >> [1325589.991223] {1}[Hardware Error]: event severity: fatal >> [1325589.991225] {1}[Hardware Error]: Error 0, type: fatal >> [1325589.991227] {1}[Hardware Error]: section_type: PCIe error >> [1325589.991228] {1}[Hardware Error]: port_type: 0, PCIe end point >> [1325589.991231] {1}[Hardware Error]: version: 3.0 >> [1325589.991233] {1}[Hardware Error]: command: 0x0002, status: 0x0010 >> [1325589.991235] {1}[Hardware Error]: device_id: 0000:01:00.1 >> [1325589.991237] {1}[Hardware Error]: slot: 0 >> [1325589.991239] {1}[Hardware Error]: secondary_bus: 0x00 >> [1325589.991240] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: >> 0x165f >> [1325589.991242] {1}[Hardware Error]: class_code: 020000 >> [1325589.991244] {1}[Hardware Error]: aer_uncor_status: 0x00100000, >> aer_uncor_mask: 0x00010000 >> [1325589.991246] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 >> [1325589.991248] {1}[Hardware Error]: TLP Header: 40000001 0000010f >> 90028090 00000000 >> [1325589.991252] Kernel panic - not syncing: Fatal hardware error! >> [1325589.991254] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O >> 6.5.13-1-pve #1 >> [1325589.991258] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS >> 2.20.1 09/13/2023 >> [1325589.991259] Call Trace: >> [1325589.991261] <NMI> >> >> root@per740:~# pveversion >> pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-3-pve) >> >> root@per740:~# ethtool -i eno4 >> driver: tg3 >> version: 6.5.13-3-pve >> firmware-version: FFV22.71.3 bc 5720-v1.39 >> expansion-rom-version: >> bus-info: 0000:01:00.1 >> supports-statistics: yes >> supports-test: yes >> supports-eeprom-access: yes >> supports-register-dump: yes >> supports-priv-flags: no >> >> >> > On Mar 28, 2024, at 15:50, Stefan Radman via pve-user < >> pve-user@lists.proxmox.com> wrote: >> > >> > >> > From: Stefan Radman <stefan.radman@me.com> >> > Subject: 6.5.13-3-pve kernel panic on shutdown >> > Date: March 28, 2024 at 15:50:02 GMT+1 >> > To: PVE User List <pve-user@pve.proxmox.com> >> > >> > >> > I recently noticed that a Dell Poweredge R540 currently running Proxmox >> VE 8.1.8 (kernel 6.5.13-3-pve) throws a kernel panic on shutdown. >> > >> > The kernel panic is triggered 3-4 seconds after the last network >> interface goes down (onboard BCM5720 LOM), while the system enters S5 >> (sleep) state. >> > >> > [84459.970212] bond0: (slave eno1): link status definitely down, >> disabling slave >> > [84459.982170] bond0: (slave eno2): link status definitely down, >> disabling slave >> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode >> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode >> > [84460.001615] bond0: now running without any active interface! >> > [84460.018133] vmbr0: port 1(bond0) entered disabled state >> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 >> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic >> Hardware Error Source: 5 >> > >> > This is reproducible on every reboot. >> > >> > R540 and BCM5720 are running the latest firmware available from the >> Dell support website. >> > >> > Link [2] below seem to suggest that my problem is related to a >> combination of ACPI S5, the tg3 driver and the BCM5720 on-board NIC. >> > >> > Has anyone else seen this lately (or ever) with Promox VE? >> > >> > Thank you >> > >> > Stefan >> > >> > [1] Use ACPI S5 for reboot #1904225: causes reboot crash on Dell T440 >> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1962730 >> > >> > [2] [SRU][Regression] Revert "PM: ACPI: reboot: Use S5 for reboot" >> which causes Bus Fatal Error when rebooting system with BCM5720 NIC >> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471 >> > >> > [3] tg3: Disable tg3 device on system reboot to avoid triggering AER >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/broadcom/tg3.c?id=2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca#n18074 >> > >> > [4] * [PATCH] tg3: Disable tg3 device on system reboot to avoid >> triggering AER >> > >> https://lore.kernel.org/netdev/CAAd53p7PmEp+vWLz+fGdDntGQ2KqgL54fo86Bpy7oy9tKzXsAg@mail.gmail.com/T/ >> > >> > [5] [v4,2/2] PM: ACPI: reboot: Reinstate S5 for reboot >> > >> https://patches.linaro.org/project/linux-acpi/patch/20220916043319.119716-2-kai.heng.feng@canonical.com/ >> > >> > [6] * [PATCH] tg3: add new module param to force device power down on >> reboot >> > >> https://lore.kernel.org/lkml/d8ed4af1-5c83-4895-9fc3-9aea25724fd9@gmail.com/T/ >> > >> > >> > [84458.600189] systemd-shutdown[1]: Syncing filesystems and block >> devices. >> > [84458.607141] systemd-shutdown[1]: Rebooting. >> > [84458.612283] spi-nor spi0.0: Software reset failed: -524 >> > [84459.777370] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion >> is called outbound_intr_mask:0x40000009 >> > [84459.970212] bond0: (slave eno1): link status definitely down, >> disabling slave >> > [84459.982170] bond0: (slave eno2): link status definitely down, >> disabling slave >> > [84459.990037] tg3 0000:04:00.0 eno1: left promiscuous mode >> > [84459.995822] tg3 0000:04:00.0 eno1: left allmulticast mode >> > [84460.001615] bond0: now running without any active interface! >> > [84460.018133] vmbr0: port 1(bond0) entered disabled state >> > [84460.291379] ACPI: PM: Preparing to enter system sleep state S5 >> > [84463.685113] {1}[Hardware Error]: Hardware error from APEI Generic >> Hardware Error Source: 5 >> > [84463.685116] {1}[Hardware Error]: event severity: fatal >> > [84463.685117] {1}[Hardware Error]: Error 0, type: fatal >> > [84463.685119] {1}[Hardware Error]: section_type: PCIe error >> > [84463.685120] {1}[Hardware Error]: port_type: 0, PCIe end point >> > [84463.685121] {1}[Hardware Error]: version: 3.0 >> > [84463.685122] {1}[Hardware Error]: command: 0x0002, status: 0x0010 >> > [84463.685123] {1}[Hardware Error]: device_id: 0000:04:00.1 >> > [84463.685125] {1}[Hardware Error]: slot: 0 >> > [84463.685126] {1}[Hardware Error]: secondary_bus: 0x00 >> > [84463.685127] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: >> 0x165f >> > [84463.685128] {1}[Hardware Error]: class_code: 020000 >> > [84463.685129] {1}[Hardware Error]: aer_uncor_status: 0x00100000, >> aer_uncor_mask: 0x00010000 >> > [84463.685130] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 >> > [84463.685131] {1}[Hardware Error]: TLP Header: 40000001 0000010f >> 90028090 00000000 >> > [84463.685134] Kernel panic - not syncing: Fatal hardware error! >> > [84463.685136] CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: P >> O 6.5.13-3-pve #1 >> > [84463.685139] Hardware name: Dell Inc. PowerEdge R540/0VC7DK, BIOS >> 2.21.1 03/07/2024 >> > [84463.685140] Call Trace: >> > [84463.685142] <NMI> >> > … >> > >> > root@pve:~# pveversion >> > pve-manager/8.1.8/d29041d9f87575d0 (running kernel: 6.5.13-3-pve) >> > root@pve:~# ethtool -i eno2 >> > driver: tg3 >> > version: 6.5.13-3-pve >> > firmware-version: FFV22.71.3 bc 5720-v1.39 >> > expansion-rom-version: >> > bus-info: 0000:04:00.1 >> > supports-statistics: yes >> > supports-test: yes >> > supports-eeprom-access: yes >> > supports-register-dump: yes >> > supports-priv-flags: no >> > root@pve:~# lspci | fgrep 04:00.1 >> > 04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme >> BCM5720 Gigabit Ethernet PCIe >> > >> > >> > >> > _______________________________________________ >> > pve-user mailing list >> > pve-user@lists.proxmox.com >> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> >> >> >> ---------- Forwarded message ---------- >> From: Stefan Radman via pve-user <pve-user@lists.proxmox.com> >> To: Proxmox VE user list <pve-user@lists.proxmox.com> >> Cc: Stefan Radman <stefan.radman@me.com>, PVE User List < >> pve-user@pve.proxmox.com> >> Bcc: >> Date: Tue, 2 Apr 2024 07:42:32 +0200 >> Subject: Re: [PVE-User] 6.5.13-3-pve kernel panic on shutdown >> _______________________________________________ >> pve-user mailing list >> pve-user@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-02 11:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <D8D305A2-D2B7-4A5D-821C-65DE75621457@kmi.com> [not found] ` <93280CB8-7582-4456-9101-D594CE2C86A2@kmi.com> [not found] ` <mailman.755.1711637904.434.pve-user@lists.proxmox.com> 2024-03-28 15:18 ` [PVE-User] 6.5.13-3-pve kernel panic on shutdown Gilberto Ferreira [not found] ` <mailman.761.1711641292.434.pve-user@lists.proxmox.com> 2024-03-28 15:57 ` Gilberto Ferreira [not found] ` <mailman.785.1712036605.434.pve-user@lists.proxmox.com> 2024-04-02 7:37 ` Gilberto Ferreira [not found] ` <5D727A1E-902A-4CA4-BEF8-A0F1CBFA754E@me.com> 2024-04-02 11:15 ` Gilberto Ferreira
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox