* Re: [PVE-User] Not sure if this is a corosync issue.
[not found] <mailman.162.1616696319.347.pve-user@lists.proxmox.com>
@ 2021-03-25 18:26 ` Gilberto Ferreira
[not found] ` <mailman.165.1616698612.347.pve-user@lists.proxmox.com>
0 siblings, 1 reply; 3+ messages in thread
From: Gilberto Ferreira @ 2021-03-25 18:26 UTC (permalink / raw)
To: jameslipski, Proxmox VE user list
What pve version?
Is this an update from previously PVE Versions???
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram
Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user
<pve-user@lists.proxmox.com> escreveu:
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski <jameslipski@protonmail.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 25 Mar 2021 18:02:25 +0000
> Subject: Not sure if this is a corosync issue.
> Greetings,
>
> Today, one of my nodes seems to have rebooted randomly (node in question has been in a production environment for several months; no issues since it was added to the cluster). During my investigation, the following is what I see before the crash; unfortunately, I'm having a little bit of an issue deciphering this:
>
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: assertion 'data[len-1] == 0' failed
> Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
> Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: 2
> Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connection
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize service
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster connection
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize service
> Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => wait_for_quorum
> Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner...
> Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock - cfs lock update failed - Permission denied
> Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => lost_agent_lock
> Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 'file-replication_cfg': no quorum!
> Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
> Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result 'exit-code'.
> Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication runner.
> Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
>
> I see that corosync experienced the following:
>
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
>
> and I'm not too sure why. Also not sure if that alone took down the system. Any help is much appreciated. If any additional information is needed, please let us know. Thank you.
>
>
> ---------- Forwarded message ----------
> From: jameslipski via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: jameslipski <jameslipski@protonmail.com>
> Bcc:
> Date: Thu, 25 Mar 2021 18:02:25 +0000
> Subject: [PVE-User] Not sure if this is a corosync issue.
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PVE-User] Not sure if this is a corosync issue.
[not found] ` <mailman.165.1616698612.347.pve-user@lists.proxmox.com>
@ 2021-03-25 18:58 ` Gilberto Ferreira
[not found] ` <mailman.166.1616699596.347.pve-user@lists.proxmox.com>
0 siblings, 1 reply; 3+ messages in thread
From: Gilberto Ferreira @ 2021-03-25 18:58 UTC (permalink / raw)
To: jameslipski, Proxmox VE user list
Hi
You should consider, with carefully, update it to new versions.
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram
Em qui., 25 de mar. de 2021 às 15:56, jameslipski via pve-user
<pve-user@lists.proxmox.com> escreveu:
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski <jameslipski@protonmail.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 25 Mar 2021 18:56:10 +0000
> Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> Hello,
>
> All nodes are running the same version 6.0-4. Pveversion -v shows:
>
> proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
> pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
> pve-kernel-5.0: 6.0-5
> pve-kernel-helper: 6.0-5
> pve-kernel-5.0.15-1-pve: 5.0.15-1
> ceph: 14.2.2-pve1
> ceph-fuse: 14.2.2-pve1
> corosync: 3.0.2-pve2
> criu: 3.11-3
> glusterfs-client: 5.5-3
> ksm-control-daemon: 1.3-1
> libjs-extjs: 6.0.1-10
> libknet1: 1.10-pve1
> libpve-access-control: 6.0-2
> libpve-apiclient-perl: 3.0-2
> libpve-common-perl: 6.0-2
> libpve-guest-common-perl: 3.0-1
> libpve-http-server-perl: 3.0-2
> libpve-storage-perl: 6.0-5
> libqb0: 1.0.5-1
> lvm2: 2.03.02-pve3
> lxc-pve: 3.1.0-61
> lxcfs: 3.0.3-pve60
> novnc-pve: 1.0.0-60
> proxmox-mini-journalreader: 1.1-1
> proxmox-widget-toolkit: 2.0-5
> pve-cluster: 6.0-4
> pve-container: 3.0-3
> pve-docs: 6.0-4
> pve-edk2-firmware: 2.20190614-1
> pve-firewall: 4.0-5
> pve-firmware: 3.0-2
> pve-ha-manager: 3.0-2
> pve-i18n: 2.0-2
> pve-qemu-kvm: 4.0.0-3
> pve-xtermjs: 3.13.2-1
> qemu-server: 6.0-5
> smartmontools: 7.0-pve2
> spiceterm: 3.1-1
> vncterm: 1.6-1
> zfsutils-linux: 0.8.1-pve1
>
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, March 25, 2021 2:26 PM, Gilberto Ferreira <gilberto.nunes32@gmail.com> wrote:
>
> > What pve version?
> > Is this an update from previously PVE Versions???
> >
> >
> > ---------------------------------------------------------------------
> >
> > Gilberto Nunes Ferreira
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user
> > pve-user@lists.proxmox.com escreveu:
> >
> > > ---------- Forwarded message ----------
> > > From: jameslipski jameslipski@protonmail.com
> > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > Cc:
> > > Bcc:
> > > Date: Thu, 25 Mar 2021 18:02:25 +0000
> > > Subject: Not sure if this is a corosync issue.
> > > Greetings,
> > > Today, one of my nodes seems to have rebooted randomly (node in question has been in a production environment for several months; no issues since it was added to the cluster). During my investigation, the following is what I see before the crash; unfortunately, I'm having a little bit of an issue deciphering this:
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: assertion 'data[len-1] == 0' failed
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2
> > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed: 2
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch failed: 2
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: 2
> > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize service
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize service
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connection
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize service
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster connection
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize service
> > > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => wait_for_quorum
> > > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner...
> > > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock - cfs lock update failed - Permission denied
> > > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> > > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => lost_agent_lock
> > > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 'file-replication_cfg': no quorum!
> > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
> > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result 'exit-code'.
> > > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication runner.
> > > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > I see that corosync experienced the following:
> > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
> > > and I'm not too sure why. Also not sure if that alone took down the system. Any help is much appreciated. If any additional information is needed, please let us know. Thank you.
> > > ---------- Forwarded message ----------
> > > From: jameslipski via pve-user pve-user@lists.proxmox.com
> > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > Cc: jameslipski jameslipski@protonmail.com
> > > Bcc:
> > > Date: Thu, 25 Mar 2021 18:02:25 +0000
> > > Subject: [PVE-User] Not sure if this is a corosync issue.
> > >
> > > pve-user mailing list
> > > pve-user@lists.proxmox.com
> > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: jameslipski <jameslipski@protonmail.com>
> Bcc:
> Date: Thu, 25 Mar 2021 18:56:10 +0000
> Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PVE-User] Not sure if this is a corosync issue.
[not found] ` <mailman.166.1616699596.347.pve-user@lists.proxmox.com>
@ 2021-03-25 19:16 ` Gilberto Ferreira
0 siblings, 0 replies; 3+ messages in thread
From: Gilberto Ferreira @ 2021-03-25 19:16 UTC (permalink / raw)
To: jameslipski, Proxmox VE user list
Nice! Keep us posted.
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram
Em qui., 25 de mar. de 2021 às 16:13, jameslipski via pve-user
<pve-user@lists.proxmox.com> escreveu:
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski <jameslipski@protonmail.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 25 Mar 2021 19:12:38 +0000
> Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> Hi,
>
> Alright. I'll try. Since these nodes are in production it might be a while till I get a chance to.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, March 25, 2021 2:58 PM, Gilberto Ferreira <gilberto.nunes32@gmail.com> wrote:
>
> > Hi
> > You should consider, with carefully, update it to new versions.
> >
> > -------------------------------------------------------------------
> >
> > Gilberto Nunes Ferreira
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Em qui., 25 de mar. de 2021 às 15:56, jameslipski via pve-user
> > pve-user@lists.proxmox.com escreveu:
> >
> > > ---------- Forwarded message ----------
> > > From: jameslipski jameslipski@protonmail.com
> > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > Cc:
> > > Bcc:
> > > Date: Thu, 25 Mar 2021 18:56:10 +0000
> > > Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> > > Hello,
> > > All nodes are running the same version 6.0-4. Pveversion -v shows:
> > > proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
> > > pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
> > > pve-kernel-5.0: 6.0-5
> > > pve-kernel-helper: 6.0-5
> > > pve-kernel-5.0.15-1-pve: 5.0.15-1
> > > ceph: 14.2.2-pve1
> > > ceph-fuse: 14.2.2-pve1
> > > corosync: 3.0.2-pve2
> > > criu: 3.11-3
> > > glusterfs-client: 5.5-3
> > > ksm-control-daemon: 1.3-1
> > > libjs-extjs: 6.0.1-10
> > > libknet1: 1.10-pve1
> > > libpve-access-control: 6.0-2
> > > libpve-apiclient-perl: 3.0-2
> > > libpve-common-perl: 6.0-2
> > > libpve-guest-common-perl: 3.0-1
> > > libpve-http-server-perl: 3.0-2
> > > libpve-storage-perl: 6.0-5
> > > libqb0: 1.0.5-1
> > > lvm2: 2.03.02-pve3
> > > lxc-pve: 3.1.0-61
> > > lxcfs: 3.0.3-pve60
> > > novnc-pve: 1.0.0-60
> > > proxmox-mini-journalreader: 1.1-1
> > > proxmox-widget-toolkit: 2.0-5
> > > pve-cluster: 6.0-4
> > > pve-container: 3.0-3
> > > pve-docs: 6.0-4
> > > pve-edk2-firmware: 2.20190614-1
> > > pve-firewall: 4.0-5
> > > pve-firmware: 3.0-2
> > > pve-ha-manager: 3.0-2
> > > pve-i18n: 2.0-2
> > > pve-qemu-kvm: 4.0.0-3
> > > pve-xtermjs: 3.13.2-1
> > > qemu-server: 6.0-5
> > > smartmontools: 7.0-pve2
> > > spiceterm: 3.1-1
> > > vncterm: 1.6-1
> > > zfsutils-linux: 0.8.1-pve1
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Thursday, March 25, 2021 2:26 PM, Gilberto Ferreira gilberto.nunes32@gmail.com wrote:
> > >
> > > > What pve version?
> > > > Is this an update from previously PVE Versions???
> > > >
> > > > Gilberto Nunes Ferreira
> > > > (47) 99676-7530 - Whatsapp / Telegram
> > > > Em qui., 25 de mar. de 2021 às 15:19, jameslipski via pve-user
> > > > pve-user@lists.proxmox.com escreveu:
> > > >
> > > > > ---------- Forwarded message ----------
> > > > > From: jameslipski jameslipski@protonmail.com
> > > > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > > > Cc:
> > > > > Bcc:
> > > > > Date: Thu, 25 Mar 2021 18:02:25 +0000
> > > > > Subject: Not sure if this is a corosync issue.
> > > > > Greetings,
> > > > > Today, one of my nodes seems to have rebooted randomly (node in question has been in a production environment for several months; no issues since it was added to the cluster). During my investigation, the following is what I see before the crash; unfortunately, I'm having a little bit of an issue deciphering this:
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: assertion 'data[len-1] == 0' failed
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2
> > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed: 2
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch failed: 2
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: 2
> > > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize service
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize service
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connection
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize service
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster connection
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize service
> > > > > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave => wait_for_quorum
> > > > > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner...
> > > > > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock - cfs lock update failed - Permission denied
> > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
> > > > > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active => lost_agent_lock
> > > > > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 'file-replication_cfg' ...
> > > > > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 'file-replication_cfg': no quorum!
> > > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
> > > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result 'exit-code'.
> > > > > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication runner.
> > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize failed: 2
> > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize failed: 2
> > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: 2
> > > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed: 2
> > > > > I see that corosync experienced the following:
> > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
> > > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result 'signal'.
> > > > > and I'm not too sure why. Also not sure if that alone took down the system. Any help is much appreciated. If any additional information is needed, please let us know. Thank you.
> > > > > ---------- Forwarded message ----------
> > > > > From: jameslipski via pve-user pve-user@lists.proxmox.com
> > > > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > > > Cc: jameslipski jameslipski@protonmail.com
> > > > > Bcc:
> > > > > Date: Thu, 25 Mar 2021 18:02:25 +0000
> > > > > Subject: [PVE-User] Not sure if this is a corosync issue.
> > > > > pve-user mailing list
> > > > > pve-user@lists.proxmox.com
> > > > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >
> > > ---------- Forwarded message ----------
> > > From: jameslipski via pve-user pve-user@lists.proxmox.com
> > > To: Proxmox VE user list pve-user@lists.proxmox.com
> > > Cc: jameslipski jameslipski@protonmail.com
> > > Bcc:
> > > Date: Thu, 25 Mar 2021 18:56:10 +0000
> > > Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> > >
> > > pve-user mailing list
> > > pve-user@lists.proxmox.com
> > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
>
>
> ---------- Forwarded message ----------
> From: jameslipski via pve-user <pve-user@lists.proxmox.com>
> To: Proxmox VE user list <pve-user@lists.proxmox.com>
> Cc: jameslipski <jameslipski@protonmail.com>
> Bcc:
> Date: Thu, 25 Mar 2021 19:12:38 +0000
> Subject: Re: [PVE-User] Not sure if this is a corosync issue.
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-03-25 19:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <mailman.162.1616696319.347.pve-user@lists.proxmox.com>
2021-03-25 18:26 ` [PVE-User] Not sure if this is a corosync issue Gilberto Ferreira
[not found] ` <mailman.165.1616698612.347.pve-user@lists.proxmox.com>
2021-03-25 18:58 ` Gilberto Ferreira
[not found] ` <mailman.166.1616699596.347.pve-user@lists.proxmox.com>
2021-03-25 19:16 ` Gilberto Ferreira
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox