From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6CF5B690BA for ; Wed, 10 Mar 2021 13:28:55 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 611891CC93 for ; Wed, 10 Mar 2021 13:28:25 +0100 (CET) Received: from styx18.konzept-is.de (styx18.konzept-is.de [212.62.202.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 887CF1CC86 for ; Wed, 10 Mar 2021 13:28:24 +0100 (CET) Received: from [10.10.10.98] (unknown [10.10.10.98]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: storm) by styx18.konzept-is.de (Postfix) with ESMTPSA id B02B51FC3F for ; Wed, 10 Mar 2021 13:28:17 +0100 (CET) To: pve-user@lists.proxmox.com References: <20210310104731.GH3397@sv.lnf.it> From: storm Message-ID: <6443ecf0-5d1e-ee29-c5aa-4332b192b8bd@konzept-is.de> Date: Wed, 10 Mar 2021 13:28:19 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210310104731.GH3397@sv.lnf.it> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.000 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains... X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Mar 2021 12:28:55 -0000 when operating a 3-node cluster, you have to ensure that at least 2 nodes are up and operational. If you want the possibility for 2 nodes failing, you need to move to the next odd number: 5 - you need at least a 5 node cluster if you want to survive the loss of two nodes without problems. We have a 7 node cluster, so 3 nodes can fail, but we also have to raise the Ceph - size to 4, because if three nodes fail you have a high possibility, that placement groups will be unavailable because they were replicated only to the three nodes which are down. btw - I think you should look at this hyperconverged solution as if it were two different clusters, the proxmox cluster and the ceph cluster although it is "all in one node"you are operating two clusters, with different preconditions. best regards Am 10/03/2021 um 11:47 schrieb Marco Gaiarin: > One of the most interesting configuration of PVE is the three node, > switchless (full mesh) configuration, depicted in some PVE docs, most > notably: > > https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server > https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09 > > But lurking 'ceph-user' mailing list, some weeks ago, lead to an > interesting discussion about 'failure domains', and many user depicted > the three node cluster as 'insecure'. > > The reasoning are about: > > a) 'min_size = 2' is a must if you need to keep your data safe; you can > set 'min_size = 1', but clearly there's no scrub/checksumming, so no > real guarantee against data corruption. > > b) but in a three node setup, with 'min_size = 2', if a node goes down, > the cluster switch in 'readonly' at the very first subsequent failure, > eg the cluster does not handle more then a failure. > > c) you can change the failure domain, eg: > mon osd down out subtree limit = osd > but in this way you have to guarantee (at worst case) room for the > double of the space on a single node (eg, three node cluster with 2TB of > space each, to guarantee the 'min_size = 2' you cannot use more then 1TB > space on overral cluster; so, a 6TB total disk space for a 1TB usable > space). > > > I'm wrong? If not, the 3-node hyperconverged cluster is suitable only > for testing? > > > Thanks. >