From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D5E916A794 for ; Thu, 25 Mar 2021 19:28:12 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C4DFC20408 for ; Thu, 25 Mar 2021 19:27:42 +0100 (CET) Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C5E5A203FC for ; Thu, 25 Mar 2021 19:27:40 +0100 (CET) Received: by mail-qv1-xf30.google.com with SMTP id g8so1690317qvx.1 for ; Thu, 25 Mar 2021 11:27:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=y/m7mRXT6VvxJ28REuSV3cRvGTAHEMZHH3CFicZ1IDQ=; b=enHTY9Ehdgel0MbDqN8dlupee4bKQHcxsyokh83czCcSJ+YzbkQggm2tBp9DpmsR6Z Lxwqi94oZlk7n4dN9MBgjLrXO/O69L+RTubLX4XMOtGu6AiIShN7tuqpbFTOx+AHYYuS VSy0+ozj1qiNKmeBfEGJGBHvgqfeQb3ZblCciek0+c/GNAg64RlWZR5/5Ff97Y7X7QXp bvPZZRVakrxmIBKPPKh5nH44ILlDnYwDep85Hs9K0OZamXJ0RsJ7oWA3oJ3uzr5Z1/eJ B9Kjp1h5NwabrfNriLHMqM+iT+EIhgPXWxyMveblkrxnkR2y2pHeB9llkkAKdvX4VAmI LyTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=y/m7mRXT6VvxJ28REuSV3cRvGTAHEMZHH3CFicZ1IDQ=; b=mdALWqRZqkS71D76aFL8vZWOn6ZntRtOuMJjurJiXEjsQP5QCrDpWL1cEQXMNWZOf6 qDjRn0lw2QtskhrMyasYRr3qAg8G2Gb8kv9bSx/bsfbLwBu5vdrkeGhQCcblltUbq32W a+OXN3jcpwmHB/qJaBZwjSUEGbDb9YFSkszhmh8rcKXx2ECgzFCwf++bSTwo82HPA/is nP982eIDj4I2NFNGxyHl3ZF9qRYj42v650mEjGr29PWVQMtUpWOIC38JgQkh0N5XnpuG BxTDAKxPuGTBBzaMxpz0Q8ZzEKkqEBLRuJmyI9LtDjecD+Pykn9TKia8xC7S0McRuF96 ODFw== X-Gm-Message-State: AOAM530mXyo3k1AH7uVhwKkLljcv6z8z9YBpGxEcBlwWs9AVOMUN8o9z xf6kfCyo2oMiv+G2dLM8e1PF82h7dECpvPlORJipG9Zf X-Google-Smtp-Source: ABdhPJz9WlnL5uy5orV4hE+smvQcdHuUUIgkFL/Z5bP2Yhwj64qngm1EHomI6s2epDrAi23ZZDZ8KdAzd0H747Xn5uc= X-Received: by 2002:a0c:8b12:: with SMTP id q18mr9766766qva.51.1616696853354; Thu, 25 Mar 2021 11:27:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Gilberto Ferreira Date: Thu, 25 Mar 2021 15:26:56 -0300 Message-ID: To: jameslipski , Proxmox VE user list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.365 Adjusted score from AWL reputation of From: address DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_ENVFROM_END_DIGIT 0.25 Envelope-from freemail username ends in digit FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [PVE-User] Not sure if this is a corosync issue. X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2021 18:28:12 -0000 What pve version? Is this an update from previously PVE Versions??? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qui., 25 de mar. de 2021 =C3=A0s 15:19, jameslipski via pve-user escreveu: > > > > > ---------- Forwarded message ---------- > From: jameslipski > To: Proxmox VE user list > Cc: > Bcc: > Date: Thu, 25 Mar 2021 18:02:25 +0000 > Subject: Not sure if this is a corosync issue. > Greetings, > > Today, one of my nodes seems to have rebooted randomly (node in question = has been in a production environment for several months; no issues since it= was added to the cluster). During my investigation, the following is what = I see before the crash; unfortunately, I'm having a little bit of an issue = deciphering this: > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set: as= sertion 'data[len-1] =3D=3D 0' failed > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2 > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited,= code=3Dkilled, status=3D11/SEGV > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result '= signal'. > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch failed:= 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch faile= d: 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quorum > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch failed: = 2 > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize serv= ice > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize serv= ice > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster connect= ion > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize servic= e > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster conne= ction > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize serv= ice > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave =3D> wait_fo= r_quorum > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication runner= ... > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_lock = - cfs lock update failed - Permission denied > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-sa1 >= /dev/null && debian-sa1 1 1) > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active =3D> lost_a= gent_lock > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 'file-r= eplication_cfg' ... > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 'file-replicat= ion_cfg': no quorum! > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited, co= de=3Dexited, status=3D13/n/a > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result 'exi= t-code'. > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replication= runner. > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize fai= led: 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize faile= d: 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize failed: = 2 > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize failed= : 2 > > I see that corosync experienced the following: > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exited,= code=3Dkilled, status=3D11/SEGV > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with result '= signal'. > > and I'm not too sure why. Also not sure if that alone took down the syste= m. Any help is much appreciated. If any additional information is needed, p= lease let us know. Thank you. > > > ---------- Forwarded message ---------- > From: jameslipski via pve-user > To: Proxmox VE user list > Cc: jameslipski > Bcc: > Date: Thu, 25 Mar 2021 18:02:25 +0000 > Subject: [PVE-User] Not sure if this is a corosync issue. > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user