public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...
@ 2021-03-10 10:47 Marco Gaiarin
  2021-03-10 12:28 ` storm
  0 siblings, 1 reply; 4+ messages in thread
From: Marco Gaiarin @ 2021-03-10 10:47 UTC (permalink / raw)
  To: pve-user


One of the most interesting configuration of PVE is the three node,
switchless (full mesh) configuration, depicted in some PVE docs, most
notably:

	https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
	https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09

But lurking 'ceph-user' mailing list, some weeks ago, lead to an
interesting discussion about 'failure domains', and many user depicted
the three node cluster as 'insecure'.

The reasoning are about:

a) 'min_size = 2' is a must if you need to keep your data safe; you can
 set 'min_size = 1', but clearly there's no scrub/checksumming, so no
real guarantee against data corruption.

b) but in a three node setup, with 'min_size = 2', if a node goes down,
 the cluster switch in 'readonly' at the very first subsequent failure,
eg the cluster does not handle more then a failure.

c) you can change the failure domain, eg:
	mon osd down out subtree limit = osd
 but in this way you have to guarantee (at worst case) room for the
double of the space on a single node (eg, three node cluster with 2TB of
space each, to guarantee the 'min_size = 2' you cannot use more then 1TB
space on overral cluster; so, a 6TB total disk space for a 1TB usable
space).


I'm wrong? If not, the 3-node hyperconverged cluster is suitable only
for testing?


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...
  2021-03-10 10:47 [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains Marco Gaiarin
@ 2021-03-10 12:28 ` storm
       [not found]   ` <mailman.58.1615379757.456.pve-user@lists.proxmox.com>
  2021-03-10 14:59   ` Marco Gaiarin
  0 siblings, 2 replies; 4+ messages in thread
From: storm @ 2021-03-10 12:28 UTC (permalink / raw)
  To: pve-user

when operating a 3-node cluster, you have to ensure that at least 2 
nodes are up and operational.

If you want the possibility for 2 nodes failing, you need to move to the 
next odd number: 5 - you need at least a 5 node cluster if you want to 
survive the loss of two nodes without problems.

We have a 7 node cluster, so 3 nodes can fail, but we also have to raise 
the Ceph - size to 4, because if three nodes fail you have a high 
possibility, that placement groups will be unavailable because they were 
replicated only to the three nodes which are down.


btw - I think you should look at this hyperconverged solution as if it 
were two different clusters, the proxmox cluster and the ceph cluster 
although it is "all in one node"you are operating two clusters, with 
different preconditions.


best regards


Am 10/03/2021 um 11:47 schrieb Marco Gaiarin:
> One of the most interesting configuration of PVE is the three node,
> switchless (full mesh) configuration, depicted in some PVE docs, most
> notably:
>
> 	https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
> 	https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09
>
> But lurking 'ceph-user' mailing list, some weeks ago, lead to an
> interesting discussion about 'failure domains', and many user depicted
> the three node cluster as 'insecure'.
>
> The reasoning are about:
>
> a) 'min_size = 2' is a must if you need to keep your data safe; you can
>   set 'min_size = 1', but clearly there's no scrub/checksumming, so no
> real guarantee against data corruption.
>
> b) but in a three node setup, with 'min_size = 2', if a node goes down,
>   the cluster switch in 'readonly' at the very first subsequent failure,
> eg the cluster does not handle more then a failure.
>
> c) you can change the failure domain, eg:
> 	mon osd down out subtree limit = osd
>   but in this way you have to guarantee (at worst case) room for the
> double of the space on a single node (eg, three node cluster with 2TB of
> space each, to guarantee the 'min_size = 2' you cannot use more then 1TB
> space on overral cluster; so, a 6TB total disk space for a 1TB usable
> space).
>
>
> I'm wrong? If not, the 3-node hyperconverged cluster is suitable only
> for testing?
>
>
> Thanks.
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...
       [not found]   ` <mailman.58.1615379757.456.pve-user@lists.proxmox.com>
@ 2021-03-10 13:13     ` storm
  0 siblings, 0 replies; 4+ messages in thread
From: storm @ 2021-03-10 13:13 UTC (permalink / raw)
  To: pve-user

Hello Eneko,

maybe I am paranoid, but I already experienced this situation when pgs 
are unavailable until you fix the node/osd on which the missing pg is on 
with replica=3 in 7 node cluster, with a failing osd on one up node and 
two other nodes down.

Also have in mind that the cluster will be really slow when a lot of 
nodes fail and recovery can take a long time...

So better be prepared.

As admin I never want to be responsible for any dataloss.


best regards and better be paranoid (always expect the worst case 
scenario to happen, as it will)


Am 10/03/2021 um 13:35 schrieb Eneko Lacunza via pve-user:
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...
  2021-03-10 12:28 ` storm
       [not found]   ` <mailman.58.1615379757.456.pve-user@lists.proxmox.com>
@ 2021-03-10 14:59   ` Marco Gaiarin
  1 sibling, 0 replies; 4+ messages in thread
From: Marco Gaiarin @ 2021-03-10 14:59 UTC (permalink / raw)
  To: pve-user

Mandi! storm
  In chel di` si favelave...

> when operating a 3-node cluster, you have to ensure that at least 2 nodes
> are up and operational.
> If you want the possibility for 2 nodes failing, you need to move to the
> next odd number: 5 - you need at least a 5 node cluster if you want to
> survive the loss of two nodes without problems.

As i suppose, so.


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-10 14:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 10:47 [PVE-User] Three node Hyperconverged PVE+Ceph and failure domains Marco Gaiarin
2021-03-10 12:28 ` storm
     [not found]   ` <mailman.58.1615379757.456.pve-user@lists.proxmox.com>
2021-03-10 13:13     ` storm
2021-03-10 14:59   ` Marco Gaiarin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal