public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Laurent Dumont <laurentfdumont@gmail.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Cc: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>,
	Eneko Lacunza <elacunza@binovo.es>
Subject: Re: [PVE-User] BIG cluster questions
Date: Fri, 25 Jun 2021 13:33:37 -0400	[thread overview]
Message-ID: <CAOAKi8wnK6E97E5uDX7CH2zr1hTnonszmiTMMTkqK9-S36xcyg@mail.gmail.com> (raw)
In-Reply-To: <mailman.16.1624545042.464.pve-user@lists.proxmox.com>

This is anecdotal but I have never seen one cluster that big. You might
want to inquire about professional support which would give you a better
perspective for that kind of scale.

On Thu, Jun 24, 2021 at 10:30 AM Eneko Lacunza via pve-user <
pve-user@lists.proxmox.com> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza <elacunza@binovo.es>
> To: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 24 Jun 2021 16:30:31 +0200
> Subject: BIG cluster questions
> Hi all,
>
> We're currently helping a customer to configure a virtualization cluster
> with 88 servers for VDI.
>
> Right know we're testing the feasibility of building just one Proxmox
> cluster of 88 nodes. A 4-node cluster has been configured too for
> comparing both (same server and networking/racks).
>
> Nodes have 2 NICs 2x25Gbps each. Currently there are two LACP bonds
> configured (one for each NIC); one for storage (NFS v4.2) and the other
> for the rest (VMs, cluster).
>
> Cluster has two rings, one on each bond.
>
> - With clusters at rest (no significant number of VMs running), we see
> quite a different corosync/knet latency average on our 88 node cluster
> (~300-400) and our 4-node cluster (<100).
>
>
> For 88-node cluster:
>
> - Creating some VMs (let's say 16), one each 30s, works well.
> - Destroying some VMs (let's say 16), one each 30s, outputs error
> messages (storage cfs lock related) and fails removing some of the VMs.
>
> - Rebooting 32 nodes, one each 30 seconds (boot for a node is about
> 120s) so that no quorum is lost, creates a cluster traffic "flood". Some
> of the rebooted nodes don't rejoin the cluster, and WUI shows all nodes
> in cluster quorum with a grey ?, instead of green OK. In this situation
> corosying latency in some nodes can skyrocket to 10s or 100s times the
> values before the reboots. Access to pmxcfs is very slow and we have
> been able to fix the issue only rebooting all nodes.
>
> - We have tried changing the transport of knet in a ring from UDP to
> SCTP as reported here:
>
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/page-2
> that gives better latencies for corosync, but the reboot issue continues.
>
> We don't know whether both issues are related or not.
>
> Could LACP bonds be the issue?
>
> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration
> "
> If your switch support the LACP (IEEE 802.3ad) protocol then we
> recommend using the corresponding bonding mode (802.3ad). Otherwise you
> should generally use the active-backup mode.
> If you intend to run your cluster network on the bonding interfaces,
> then you have to use active-passive mode on the bonding interfaces,
> other modes are unsupported.
> "
> As per second line, we understand that running cluster networking over a
> LACP bond is not supported (just to confirm our interpretation)? We're
> in the process of reconfiguring nodes/switches to test without a bond,
> to see if that gives us a stable cluster (will report on this). Do you
> think this could be the issue?
>
>
> Now for more general questions; do you think a 88-node Proxmox VE
> cluster is feasible?
>
> Those 88 nodes will host about 14.000 VMs. Will HA manager be able to
> manage them, or are they too many? (HA for those VMs doesn't seem to be
> a requirement right know).
>
>
> Thanks a lot
> Eneko
>
>
>       EnekoLacunza
>
> CTO | Zuzendari teknikoa
>
> Binovo IT Human Project
>
>         943 569 206 <tel:943 569 206>
>
>         elacunza@binovo.es <mailto:elacunza@binovo.es>
>
>         binovo.es <//binovo.es>
>
>         Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
>
>
> youtube <https://www.youtube.com/user/CANALBINOVO/>
>         linkedin <https://www.linkedin.com/company/37269706/>
>
>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza via pve-user <pve-user@lists.proxmox.com>
> To: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>
> Cc: Eneko Lacunza <elacunza@binovo.es>
> Bcc:
> Date: Thu, 24 Jun 2021 16:30:31 +0200
> Subject: [PVE-User] BIG cluster questions
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


       reply	other threads:[~2021-06-25 17:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.16.1624545042.464.pve-user@lists.proxmox.com>
2021-06-25 17:33 ` Laurent Dumont [this message]
2021-06-26 11:16 ` aderumier
2021-06-26 12:59 JR Richardson
     [not found] <a4c39bce-b416-1286-3374-fc73afa41125@binovo.es>
2021-06-28  9:32 ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOAKi8wnK6E97E5uDX7CH2zr1hTnonszmiTMMTkqK9-S36xcyg@mail.gmail.com \
    --to=laurentfdumont@gmail.com \
    --cc=elacunza@binovo.es \
    --cc=pve-user@lists.proxmox.com \
    --cc=pve-user@pve.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal