From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 285EF62708 for ; Wed, 30 Sep 2020 13:48:10 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 15CBE1887D for ; Wed, 30 Sep 2020 13:47:40 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 8B77218873 for ; Wed, 30 Sep 2020 13:47:39 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 50AF04592F for ; Wed, 30 Sep 2020 13:47:39 +0200 (CEST) To: Proxmox VE development discussion , =?UTF-8?Q?Fabian_Gr=c3=bcnbichler?= References: <20200930112131.2044392-1-f.gruenbichler@proxmox.com> From: Thomas Lamprecht Message-ID: Date: Wed, 30 Sep 2020 13:47:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101 Thunderbird/82.0 MIME-Version: 1.0 In-Reply-To: <20200930112131.2044392-1-f.gruenbichler@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL -0.164 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] applied: [PATCH cluster] pmxcfs: protect CPG operations with mutex X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Sep 2020 11:48:10 -0000 On 30.09.20 13:21, Fabian Gr=C3=BCnbichler wrote: > cpg_mcast_joined (and transitively, cpg_join/leave) are not thread-safe= =2E > pmxcfs triggers such operations via FUSE and CPG dispatch callbacks, > which are running in concurrent threads. >=20 > accordingly, we need to protect these operations with a mutex, otherwis= e > they might return CS_OK without actually doing what they were supposed > to do (which in turn can lead to the dfsm taking a wrong turn and > getting stuck in a supposedly short-lived state, blocking access via > FUSE and getting whole clusters fenced). >=20 > huge thanks to Alexandre Derumier for providing the initial bug report > and quite a lot of test runs while debugging this issue. >=20 > Signed-off-by: Fabian Gr=C3=BCnbichler > --- >=20 > Notes: > we could recycle sync_mutex, but that makes it harder to reason > about securing all code paths. it also protects non CPG operations > as part of the sync messsage queue handling, so mixing those up is > non-ideal. >=20 > @Alexandre: this is a slightly different approach compared to the t= est > build from yesterday, so if you want to test this as well it would > be very welcome :) >=20 > data/src/dfsm.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) >=20 > applied, much thanks to all involved!