From: storm <ralf.storm@konzept-is.de>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...
Date: Wed, 10 Mar 2021 13:28:19 +0100 [thread overview]
Message-ID: <6443ecf0-5d1e-ee29-c5aa-4332b192b8bd@konzept-is.de> (raw)
In-Reply-To: <20210310104731.GH3397@sv.lnf.it>
when operating a 3-node cluster, you have to ensure that at least 2
nodes are up and operational.
If you want the possibility for 2 nodes failing, you need to move to the
next odd number: 5 - you need at least a 5 node cluster if you want to
survive the loss of two nodes without problems.
We have a 7 node cluster, so 3 nodes can fail, but we also have to raise
the Ceph - size to 4, because if three nodes fail you have a high
possibility, that placement groups will be unavailable because they were
replicated only to the three nodes which are down.
btw - I think you should look at this hyperconverged solution as if it
were two different clusters, the proxmox cluster and the ceph cluster
although it is "all in one node"you are operating two clusters, with
different preconditions.
best regards
Am 10/03/2021 um 11:47 schrieb Marco Gaiarin:
> One of the most interesting configuration of PVE is the three node,
> switchless (full mesh) configuration, depicted in some PVE docs, most
> notably:
>
> https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
> https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09
>
> But lurking 'ceph-user' mailing list, some weeks ago, lead to an
> interesting discussion about 'failure domains', and many user depicted
> the three node cluster as 'insecure'.
>
> The reasoning are about:
>
> a) 'min_size = 2' is a must if you need to keep your data safe; you can
> set 'min_size = 1', but clearly there's no scrub/checksumming, so no
> real guarantee against data corruption.
>
> b) but in a three node setup, with 'min_size = 2', if a node goes down,
> the cluster switch in 'readonly' at the very first subsequent failure,
> eg the cluster does not handle more then a failure.
>
> c) you can change the failure domain, eg:
> mon osd down out subtree limit = osd
> but in this way you have to guarantee (at worst case) room for the
> double of the space on a single node (eg, three node cluster with 2TB of
> space each, to guarantee the 'min_size = 2' you cannot use more then 1TB
> space on overral cluster; so, a 6TB total disk space for a 1TB usable
> space).
>
>
> I'm wrong? If not, the 3-node hyperconverged cluster is suitable only
> for testing?
>
>
> Thanks.
>
next prev parent reply other threads:[~2021-03-10 12:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-10 10:47 Marco Gaiarin
2021-03-10 12:28 ` storm [this message]
[not found] ` <mailman.58.1615379757.456.pve-user@lists.proxmox.com>
2021-03-10 13:13 ` storm
2021-03-10 14:59 ` Marco Gaiarin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6443ecf0-5d1e-ee29-c5aa-4332b192b8bd@konzept-is.de \
--to=ralf.storm@konzept-is.de \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox