From: Laurent Dumont <laurentfdumont@gmail.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Cc: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>,
Eneko Lacunza <elacunza@binovo.es>
Subject: Re: [PVE-User] BIG cluster questions
Date: Fri, 25 Jun 2021 13:33:37 -0400 [thread overview]
Message-ID: <CAOAKi8wnK6E97E5uDX7CH2zr1hTnonszmiTMMTkqK9-S36xcyg@mail.gmail.com> (raw)
In-Reply-To: <mailman.16.1624545042.464.pve-user@lists.proxmox.com>
This is anecdotal but I have never seen one cluster that big. You might
want to inquire about professional support which would give you a better
perspective for that kind of scale.
On Thu, Jun 24, 2021 at 10:30 AM Eneko Lacunza via pve-user <
pve-user@lists.proxmox.com> wrote:
>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza <elacunza@binovo.es>
> To: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 24 Jun 2021 16:30:31 +0200
> Subject: BIG cluster questions
> Hi all,
>
> We're currently helping a customer to configure a virtualization cluster
> with 88 servers for VDI.
>
> Right know we're testing the feasibility of building just one Proxmox
> cluster of 88 nodes. A 4-node cluster has been configured too for
> comparing both (same server and networking/racks).
>
> Nodes have 2 NICs 2x25Gbps each. Currently there are two LACP bonds
> configured (one for each NIC); one for storage (NFS v4.2) and the other
> for the rest (VMs, cluster).
>
> Cluster has two rings, one on each bond.
>
> - With clusters at rest (no significant number of VMs running), we see
> quite a different corosync/knet latency average on our 88 node cluster
> (~300-400) and our 4-node cluster (<100).
>
>
> For 88-node cluster:
>
> - Creating some VMs (let's say 16), one each 30s, works well.
> - Destroying some VMs (let's say 16), one each 30s, outputs error
> messages (storage cfs lock related) and fails removing some of the VMs.
>
> - Rebooting 32 nodes, one each 30 seconds (boot for a node is about
> 120s) so that no quorum is lost, creates a cluster traffic "flood". Some
> of the rebooted nodes don't rejoin the cluster, and WUI shows all nodes
> in cluster quorum with a grey ?, instead of green OK. In this situation
> corosying latency in some nodes can skyrocket to 10s or 100s times the
> values before the reboots. Access to pmxcfs is very slow and we have
> been able to fix the issue only rebooting all nodes.
>
> - We have tried changing the transport of knet in a ring from UDP to
> SCTP as reported here:
>
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871/page-2
> that gives better latencies for corosync, but the reboot issue continues.
>
> We don't know whether both issues are related or not.
>
> Could LACP bonds be the issue?
>
> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration
> "
> If your switch support the LACP (IEEE 802.3ad) protocol then we
> recommend using the corresponding bonding mode (802.3ad). Otherwise you
> should generally use the active-backup mode.
> If you intend to run your cluster network on the bonding interfaces,
> then you have to use active-passive mode on the bonding interfaces,
> other modes are unsupported.
> "
> As per second line, we understand that running cluster networking over a
> LACP bond is not supported (just to confirm our interpretation)? We're
> in the process of reconfiguring nodes/switches to test without a bond,
> to see if that gives us a stable cluster (will report on this). Do you
> think this could be the issue?
>
>
> Now for more general questions; do you think a 88-node Proxmox VE
> cluster is feasible?
>
> Those 88 nodes will host about 14.000 VMs. Will HA manager be able to
> manage them, or are they too many? (HA for those VMs doesn't seem to be
> a requirement right know).
>
>
> Thanks a lot
> Eneko
>
>
> EnekoLacunza
>
> CTO | Zuzendari teknikoa
>
> Binovo IT Human Project
>
> 943 569 206 <tel:943 569 206>
>
> elacunza@binovo.es <mailto:elacunza@binovo.es>
>
> binovo.es <//binovo.es>
>
> Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
>
>
> youtube <https://www.youtube.com/user/CANALBINOVO/>
> linkedin <https://www.linkedin.com/company/37269706/>
>
>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza via pve-user <pve-user@lists.proxmox.com>
> To: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>
> Cc: Eneko Lacunza <elacunza@binovo.es>
> Bcc:
> Date: Thu, 24 Jun 2021 16:30:31 +0200
> Subject: [PVE-User] BIG cluster questions
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
next parent reply other threads:[~2021-06-25 17:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.16.1624545042.464.pve-user@lists.proxmox.com>
2021-06-25 17:33 ` Laurent Dumont [this message]
2021-06-26 11:16 ` aderumier
2021-06-26 12:59 JR Richardson
[not found] <a4c39bce-b416-1286-3374-fc73afa41125@binovo.es>
2021-06-28 9:32 ` Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOAKi8wnK6E97E5uDX7CH2zr1hTnonszmiTMMTkqK9-S36xcyg@mail.gmail.com \
--to=laurentfdumont@gmail.com \
--cc=elacunza@binovo.es \
--cc=pve-user@lists.proxmox.com \
--cc=pve-user@pve.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal