From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 1F64D6A77E for ; Thu, 25 Mar 2021 19:59:15 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 164A2208EB for ; Thu, 25 Mar 2021 19:59:15 +0100 (CET) Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id ED9DE208DF for ; Thu, 25 Mar 2021 19:59:12 +0100 (CET) Received: by mail-qk1-x732.google.com with SMTP id o5so2920412qkb.0 for ; Thu, 25 Mar 2021 11:59:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=SfYWwYh01Vec897lE/NxOZujJtbUcbiypxRPz+cT0rA=; b=ukV2iYK5+LdFkBJCZZxxrjnSRfAttEwLKMA1qPxp65LM+JxCzSh45pvZHYxkFr7xxn 5LZuJf2dD9BpNfTRCCISAUn9pRJuSTjt/NVi0PfmSWmNW9DZISg7WZwx/KQjGaA/duWs d/pUEZQBNZuYlNJ9H1MRUp1hKs/REDMCUPex+8Znmo5daA8XVDoW7UrpqvLapFYoN+l7 wptp6/0YfmJxFGKiWVtk6c0+tC65S53w72ztAp2bkthMgSdgKVDno9BNmFs3QxppL+63 Vjbf+KUOS04Pglc/BWeb2nMfc7mPkNlhK9ageynPYmnyHlAnNxG/Aa4JAdTQ4WOdb2Ix 8aUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=SfYWwYh01Vec897lE/NxOZujJtbUcbiypxRPz+cT0rA=; b=ba4qtqHDglbKBYmDW6w5TSLcj/J+ewYScacFqT/TTJDlXlpLktI7UBQQ8KpXK3TmT3 voNcI0a1FF9u3OITV8HIH5Fx7jC3F8kQdyVVcLzVwMlez6odKSxaM39IJ/fX03Tn3wdy vQc5rnLLfQH3ifD8KuqsjMzimFAk7NM6oBN1bwIdkOg67p3HQEEWIA8SIrIrKrD4T7eQ 0p3aIUXHslgWmhEOyG8MeRBWq7DdGMgU049sM9lvSr2yo2x4H3SmQb1mO+ImesBpTlar zL5VoKrjFKRo7Jpj1dXZ42VxZLxU+CsaT9JPUifEs/+4powVKXGGQGo7+a/C2r1cM+op Jd/w== X-Gm-Message-State: AOAM5325/31ucfFvQU4B256oNV2h3SEUYWURn9DTYW4iXWBJOIswGdv3 VD4njFDDpj2oPDwfpjIsS4A8Yxw+hD/drba3zRA= X-Google-Smtp-Source: ABdhPJwz9XqctTUTKaJjJG15NzLIe5eBe6qg01m2LUCFsed0K5HX/h3jgov8cQYDEbLVWU6WYJXQO47v3mYy0krUgy4= X-Received: by 2002:a37:9e4e:: with SMTP id h75mr9525503qke.180.1616698745785; Thu, 25 Mar 2021 11:59:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Gilberto Ferreira Date: Thu, 25 Mar 2021 15:58:29 -0300 Message-ID: To: jameslipski , Proxmox VE user list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL -0.063 Adjusted score from AWL reputation of From: address DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_ENVFROM_END_DIGIT 0.25 Envelope-from freemail username ends in digit FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [PVE-User] Not sure if this is a corosync issue. X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2021 18:59:15 -0000 Hi You should consider, with carefully, update it to new versions. --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qui., 25 de mar. de 2021 =C3=A0s 15:56, jameslipski via pve-user escreveu: > > > > > ---------- Forwarded message ---------- > From: jameslipski > To: Proxmox VE user list > Cc: > Bcc: > Date: Thu, 25 Mar 2021 18:56:10 +0000 > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > Hello, > > All nodes are running the same version 6.0-4. Pveversion -v shows: > > proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve) > pve-manager: 6.0-4 (running version: 6.0-4/2a719255) > pve-kernel-5.0: 6.0-5 > pve-kernel-helper: 6.0-5 > pve-kernel-5.0.15-1-pve: 5.0.15-1 > ceph: 14.2.2-pve1 > ceph-fuse: 14.2.2-pve1 > corosync: 3.0.2-pve2 > criu: 3.11-3 > glusterfs-client: 5.5-3 > ksm-control-daemon: 1.3-1 > libjs-extjs: 6.0.1-10 > libknet1: 1.10-pve1 > libpve-access-control: 6.0-2 > libpve-apiclient-perl: 3.0-2 > libpve-common-perl: 6.0-2 > libpve-guest-common-perl: 3.0-1 > libpve-http-server-perl: 3.0-2 > libpve-storage-perl: 6.0-5 > libqb0: 1.0.5-1 > lvm2: 2.03.02-pve3 > lxc-pve: 3.1.0-61 > lxcfs: 3.0.3-pve60 > novnc-pve: 1.0.0-60 > proxmox-mini-journalreader: 1.1-1 > proxmox-widget-toolkit: 2.0-5 > pve-cluster: 6.0-4 > pve-container: 3.0-3 > pve-docs: 6.0-4 > pve-edk2-firmware: 2.20190614-1 > pve-firewall: 4.0-5 > pve-firmware: 3.0-2 > pve-ha-manager: 3.0-2 > pve-i18n: 2.0-2 > pve-qemu-kvm: 4.0.0-3 > pve-xtermjs: 3.13.2-1 > qemu-server: 6.0-5 > smartmontools: 7.0-pve2 > spiceterm: 3.1-1 > vncterm: 1.6-1 > zfsutils-linux: 0.8.1-pve1 > > > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original = Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 > On Thursday, March 25, 2021 2:26 PM, Gilberto Ferreira wrote: > > > What pve version? > > Is this an update from previously PVE Versions??? > > > > > > --------------------------------------------------------------------- > > > > Gilberto Nunes Ferreira > > (47) 99676-7530 - Whatsapp / Telegram > > > > Em qui., 25 de mar. de 2021 =C3=A0s 15:19, jameslipski via pve-user > > pve-user@lists.proxmox.com escreveu: > > > > > ---------- Forwarded message ---------- > > > From: jameslipski jameslipski@protonmail.com > > > To: Proxmox VE user list pve-user@lists.proxmox.com > > > Cc: > > > Bcc: > > > Date: Thu, 25 Mar 2021 18:02:25 +0000 > > > Subject: Not sure if this is a corosync issue. > > > Greetings, > > > Today, one of my nodes seems to have rebooted randomly (node in quest= ion has been in a production environment for several months; no issues sinc= e it was added to the cluster). During my investigation, the following is w= hat I see before the crash; unfortunately, I'm having a little bit of an is= sue deciphering this: > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: rrdentry_hash_set= : assertion 'data[len-1] =3D=3D 0' failed > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_dispatch failed= : 2 > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [dcdb] crit: cpg_leave failed: 2 > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exi= ted, code=3Dkilled, status=3D11/SEGV > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with resu= lt 'signal'. > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [confdb] crit: cmap_dispatch fai= led: 2 > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [quorum] crit: quorum_dispatch f= ailed: 2 > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] notice: node lost quoru= m > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_dispatch fail= ed: 2 > > > Mar 25 12:54:54 node09 pmxcfs[5419]: [status] crit: cpg_leave failed:= 2 > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [quorum] crit: can't initialize = service > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [confdb] crit: can't initialize = service > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] notice: start cluster con= nection > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [dcdb] crit: can't initialize se= rvice > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] notice: start cluster c= onnection > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:54:55 node09 pmxcfs[5419]: [status] crit: can't initialize = service > > > Mar 25 12:54:56 node09 pve-ha-crm[2161]: status change slave =3D> wai= t_for_quorum > > > Mar 25 12:55:00 node09 systemd[1]: Starting Proxmox VE replication ru= nner... > > > Mar 25 12:55:00 node09 pve-ha-lrm[2169]: lost lock 'ha_agent_node09_l= ock - cfs lock update failed - Permission denied > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:01 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:55:01 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:01 node09 CRON[2755555]: (root) CMD (command -v debian-s= a1 > /dev/null && debian-sa1 1 1) > > > Mar 25 12:55:02 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:03 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:04 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:05 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:05 node09 pve-ha-lrm[2169]: status change active =3D> lo= st_agent_lock > > > Mar 25 12:55:06 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:07 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:55:07 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:08 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:09 node09 pvesr[2755547]: trying to acquire cfs lock 'fi= le-replication_cfg' ... > > > Mar 25 12:55:10 node09 pvesr[2755547]: error with cfs lock 'file-repl= ication_cfg': no quorum! > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Main process exited= , code=3Dexited, status=3D13/n/a > > > Mar 25 12:55:10 node09 systemd[1]: pvesr.service: Failed with result = 'exit-code'. > > > Mar 25 12:55:10 node09 systemd[1]: Failed to start Proxmox VE replica= tion runner. > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:13 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:19 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:25 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [quorum] crit: quorum_initialize= failed: 2 > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [confdb] crit: cmap_initialize f= ailed: 2 > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [dcdb] crit: cpg_initialize fail= ed: 2 > > > Mar 25 12:55:31 node09 pmxcfs[5419]: [status] crit: cpg_initialize fa= iled: 2 > > > I see that corosync experienced the following: > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Main process exi= ted, code=3Dkilled, status=3D11/SEGV > > > Mar 25 12:54:54 node09 systemd[1]: corosync.service: Failed with resu= lt 'signal'. > > > and I'm not too sure why. Also not sure if that alone took down the s= ystem. Any help is much appreciated. If any additional information is neede= d, please let us know. Thank you. > > > ---------- Forwarded message ---------- > > > From: jameslipski via pve-user pve-user@lists.proxmox.com > > > To: Proxmox VE user list pve-user@lists.proxmox.com > > > Cc: jameslipski jameslipski@protonmail.com > > > Bcc: > > > Date: Thu, 25 Mar 2021 18:02:25 +0000 > > > Subject: [PVE-User] Not sure if this is a corosync issue. > > > > > > pve-user mailing list > > > pve-user@lists.proxmox.com > > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > > ---------- Forwarded message ---------- > From: jameslipski via pve-user > To: Proxmox VE user list > Cc: jameslipski > Bcc: > Date: Thu, 25 Mar 2021 18:56:10 +0000 > Subject: Re: [PVE-User] Not sure if this is a corosync issue. > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user