From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id BA1F060821
 for <pve-devel@lists.proxmox.com>; Thu, 24 Sep 2020 16:02:13 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id AF0C212BAC
 for <pve-devel@lists.proxmox.com>; Thu, 24 Sep 2020 16:02:13 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 5953E12BA1
 for <pve-devel@lists.proxmox.com>; Thu, 24 Sep 2020 16:02:12 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1E6A644929
 for <pve-devel@lists.proxmox.com>; Thu, 24 Sep 2020 16:02:12 +0200 (CEST)
Date: Thu, 24 Sep 2020 16:02:04 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <216436814.339545.1599142316781.JavaMail.zimbra@odiso.com>
 <2bdde345-b966-d393-44d1-e5385821fbad@proxmox.com>
 <65105078.871552.1600269422383.JavaMail.zimbra@odiso.com>
 <1600333910.bmtyynl8cl.astroid@nora.none>
 <475756962.894651.1600336772315.JavaMail.zimbra@odiso.com>
 <86855479.894870.1600336947072.JavaMail.zimbra@odiso.com>
 <501f031f-3f1b-0633-fab3-7fcb7fdddaf5@proxmox.com>
 <335862527.964527.1600646099489.JavaMail.zimbra@odiso.com>
 <7286111.1004215.1600753433567.JavaMail.zimbra@odiso.com>
In-Reply-To: <7286111.1004215.1600753433567.JavaMail.zimbra@odiso.com>
MIME-Version: 1.0
User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid)
Message-Id: <1600955976.my09kyug7a.astroid@nora.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.098 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 KAM_LOTSOFHASH           0.25 Emails with lots of hash-like gibberish
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: Re: [pve-devel] corosync bug: cluster break after 1 node clean
 shutdown
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 24 Sep 2020 14:02:13 -0000

On September 22, 2020 7:43 am, Alexandre DERUMIER wrote:
> I have done test with "kill -9 <pidofcorosync",  and I have around 20s ha=
ng on other nodes,
> but after that it's become available again.
>=20
>=20
> So, it's really something when corosync is in shutdown phase, and pmxcfs =
is running.
>=20
> So, for now, as workaround, I have changed
>=20
> /lib/systemd/system/pve-cluster.service
>=20
> #Wants=3Dcorosync.service
> #Before=3Dcorosync.service
> Requires=3Dcorosync.service
> After=3Dcorosync.service
>=20
>=20
> Like this, at shutdown, pve-cluster is stopped before corosync, and if I =
restart corosync, pve-cluster is stopped first.

if you are still able to test, it would be great if you could give the=20
following packages a spin (they only contain some extra debug prints=20
on message processing/sending):

http://download.proxmox.com/temp/pmxcfs-dbg/

64eb9dbd2f60fe319abaeece89a84fd5de1f05a8c38cb9871058ec1f55025486ec15b7c0053=
976159fe5c2518615fd80084925abf4d3a5061ea7d6edef264c36  pve-cluster_6.1-8_am=
d64.deb
04b557c7f0dc1aa2846b534d6afab70c2b8d4720ac307364e015d885e2e997b6dcaa54cad67=
3b22d626d27cb053e5723510fde15d078d5fe1f262fc5486e6cef  pve-cluster-dbgsym_6=
.1-8_amd64.deb

ideally, you could get the debug logs from all nodes, and the=20
coredump/bt from the node where pmxcfs hangs. thanks!

diff --git a/data/src/dfsm.c b/data/src/dfsm.c
index 529c7f9..e0bd93f 100644
--- a/data/src/dfsm.c
+++ b/data/src/dfsm.c
@@ -162,8 +162,8 @@ static void
 dfsm_send_sync_message_abort(dfsm_t *dfsm)
 {
 	g_return_if_fail(dfsm !=3D NULL);
-
 	g_mutex_lock (&dfsm->sync_mutex);
+	cfs_dom_debug(dfsm->log_domain, "dfsm_send_sync_message_abort - %" PRIu64=
" / %" PRIu64, dfsm->msgcount_rcvd, dfsm->msgcount);
 	dfsm->msgcount_rcvd =3D dfsm->msgcount;
 	g_cond_broadcast (&dfsm->sync_cond);
 	g_mutex_unlock (&dfsm->sync_mutex);
@@ -181,6 +181,7 @@ dfsm_record_local_result(
=20
 	g_mutex_lock (&dfsm->sync_mutex);
 	dfsm_result_t *rp =3D (dfsm_result_t *)g_hash_table_lookup(dfsm->results,=
 &msg_count);
+	cfs_dom_debug(dfsm->log_domain, "dfsm_record_local_result - %" PRIu64":  =
%d", msg_count, msg_result);
 	if (rp) {
 		rp->result =3D msg_result;
 		rp->processed =3D processed;
@@ -235,6 +236,8 @@ dfsm_send_state_message_full(
 	g_return_val_if_fail(DFSM_VALID_STATE_MESSAGE(type), CS_ERR_INVALID_PARAM=
);
 	g_return_val_if_fail(!len || iov !=3D NULL, CS_ERR_INVALID_PARAM);
=20
+	cfs_dom_debug(dfsm->log_domain, "dfsm_send_state_message_full: type %d le=
n %d", type, len);
+
 	dfsm_message_state_header_t header;
 	header.base.type =3D type;
 	header.base.subtype =3D 0;
@@ -317,6 +320,7 @@ dfsm_send_message_sync(
 	for (int i =3D 0; i < len; i++)
 		real_iov[i + 1] =3D iov[i];
=20
+	cfs_dom_debug(dfsm->log_domain, "dfsm_send_messag_sync: type NORMAL, msgt=
ype %d, len %d", msgtype, len);
 	cs_error_t result =3D dfsm_send_message_full(dfsm, real_iov, len + 1, 1);
=20
 	g_mutex_unlock (&dfsm->sync_mutex);
@@ -335,10 +339,12 @@ dfsm_send_message_sync(
 	if (rp) {
 		g_mutex_lock (&dfsm->sync_mutex);
=20
-		while (dfsm->msgcount_rcvd < msgcount)
+		while (dfsm->msgcount_rcvd < msgcount) {
+			cfs_dom_debug(dfsm->log_domain, "dfsm_send_message_sync: waiting for re=
ceived messages %" PRIu64 " / %" PRIu64, dfsm->msgcount_rcvd, msgcount);
 			g_cond_wait (&dfsm->sync_cond, &dfsm->sync_mutex);
+		}
+		cfs_dom_debug(dfsm->log_domain, "dfsm_send_message_sync: done waiting fo=
r received messages!");
=20
-     =20
 		g_hash_table_remove(dfsm->results, &rp->msgcount);
 	=09
 		g_mutex_unlock (&dfsm->sync_mutex);
@@ -685,9 +691,13 @@ dfsm_cpg_deliver_callback(
 		return;
 	}
=20
+	cfs_dom_debug(dfsm->log_domain, "received message's header type is %d", b=
ase_header->type);
+
 	if (base_header->type =3D=3D DFSM_MESSAGE_NORMAL) {
=20
 		dfsm_message_normal_header_t *header =3D (dfsm_message_normal_header_t *=
)msg;
+		cfs_dom_debug(dfsm->log_domain, "received normal message (type =3D %d, s=
ubtype =3D %d, %zd bytes)",
+			      base_header->type, base_header->subtype, msg_len);
=20
 		if (msg_len < sizeof(dfsm_message_normal_header_t)) {
 			cfs_dom_critical(dfsm->log_domain, "received short message (type =3D %d=
, subtype =3D %d, %zd bytes)",
@@ -704,6 +714,8 @@ dfsm_cpg_deliver_callback(
 		} else {
=20
 			int msg_res =3D -1;
+			cfs_dom_debug(dfsm->log_domain, "deliver message %" PRIu64 " (subtype =
=3D %d, length =3D %zd)",
+				      header->count, base_header->subtype, msg_len);=20
 			int res =3D dfsm->dfsm_callbacks->dfsm_deliver_fn(
 				dfsm, dfsm->data, &msg_res, nodeid, pid, base_header->subtype,=20
 				base_header->time, (uint8_t *)msg + sizeof(dfsm_message_normal_header_=
t),
@@ -724,6 +736,8 @@ dfsm_cpg_deliver_callback(
 	 */
=20
 	dfsm_message_state_header_t *header =3D (dfsm_message_state_header_t *)ms=
g;
+	cfs_dom_debug(dfsm->log_domain, "received state message (type =3D %d, sub=
type =3D %d, %zd bytes), mode is %d",
+			 base_header->type, base_header->subtype, msg_len, mode);
=20
 	if (msg_len < sizeof(dfsm_message_state_header_t)) {
 		cfs_dom_critical(dfsm->log_domain, "received short state message (type =
=3D %d, subtype =3D %d, %zd bytes)",
@@ -744,6 +758,7 @@ dfsm_cpg_deliver_callback(
=20
 	if (mode =3D=3D DFSM_MODE_SYNCED) {
 		if (base_header->type =3D=3D DFSM_MESSAGE_UPDATE_COMPLETE) {
+			cfs_dom_debug(dfsm->log_domain, "received update complete message");
=20
 			for (int i =3D 0; i < dfsm->sync_info->node_count; i++)
 				dfsm->sync_info->nodes[i].synced =3D 1;
@@ -754,6 +769,7 @@ dfsm_cpg_deliver_callback(
 			return;
=20
 		} else if (base_header->type =3D=3D DFSM_MESSAGE_VERIFY_REQUEST) {
+			cfs_dom_debug(dfsm->log_domain, "received verify request message");
=20
 			if (msg_len !=3D sizeof(dfsm->csum_counter)) {
 				cfs_dom_critical(dfsm->log_domain, "cpg received verify request with w=
rong length (%zd bytes) form node %d/%d", msg_len, nodeid, pid);
@@ -823,7 +839,6 @@ dfsm_cpg_deliver_callback(
 	} else if (mode =3D=3D DFSM_MODE_START_SYNC) {
=20
 		if (base_header->type =3D=3D DFSM_MESSAGE_SYNC_START) {
-
 			if (nodeid !=3D dfsm->lowest_nodeid) {
 				cfs_dom_critical(dfsm->log_domain, "ignore sync request from wrong mem=
ber %d/%d",
 						 nodeid, pid);
@@ -861,6 +876,7 @@ dfsm_cpg_deliver_callback(
 			return;
=20
 		} else if (base_header->type =3D=3D DFSM_MESSAGE_STATE) {
+			cfs_dom_debug(dfsm->log_domain, "received state message for %d/%d", nod=
eid, pid);
=20
 			dfsm_node_info_t *ni;
 		=09
@@ -906,6 +922,8 @@ dfsm_cpg_deliver_callback(
 						goto leave;
 				}
=20
+			} else {
+				cfs_dom_debug(dfsm->log_domain, "haven't received all states, waiting =
for more");
 			}
=20
 			return;
@@ -914,6 +932,7 @@ dfsm_cpg_deliver_callback(
 	} else if (mode =3D=3D DFSM_MODE_UPDATE) {
=20
 		if (base_header->type =3D=3D DFSM_MESSAGE_UPDATE) {
+			cfs_dom_debug(dfsm->log_domain, "received update message");
 			=09
 			int res =3D dfsm->dfsm_callbacks->dfsm_process_update_fn(
 				dfsm, dfsm->data, dfsm->sync_info, nodeid, pid, msg, msg_len);
@@ -925,6 +944,7 @@ dfsm_cpg_deliver_callback(
=20
 		} else if (base_header->type =3D=3D DFSM_MESSAGE_UPDATE_COMPLETE) {
=20
+			cfs_dom_debug(dfsm->log_domain, "received update complete message");
=20
 			int res =3D dfsm->dfsm_callbacks->dfsm_commit_fn(dfsm, dfsm->data, dfsm=
->sync_info);
=20
@@ -1104,6 +1124,7 @@ dfsm_cpg_confchg_callback(
 	size_t joined_list_entries)
 {
 	cs_error_t result;
+	cfs_debug("dfsm_cpg_confchg_callback called");
=20
 	dfsm_t *dfsm =3D NULL;
 	result =3D cpg_context_get(handle, (gpointer *)&dfsm);
=