From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jmr.richardson@gmail.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id AF2B270EC5
 for <pve-user@lists.proxmox.com>; Sat, 26 Jun 2021 14:59:53 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id A0B361DF26
 for <pve-user@lists.proxmox.com>; Sat, 26 Jun 2021 14:59:53 +0200 (CEST)
Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com
 [IPv6:2607:f8b0:4864:20::32e])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 21BF01DF17
 for <pve-user@lists.proxmox.com>; Sat, 26 Jun 2021 14:59:53 +0200 (CEST)
Received: by mail-ot1-x32e.google.com with SMTP id
 h24-20020a9d64180000b029036edcf8f9a6so12489834otl.3
 for <pve-user@lists.proxmox.com>; Sat, 26 Jun 2021 05:59:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:subject:date:message-id:mime-version
 :content-transfer-encoding:thread-index:content-language;
 bh=WKHCxdxxjqWgyKaA3PDdKRC1cgq/dx2USnIQhDGdWI8=;
 b=tnfFQrpUbYDL+MWdwy5dpoSOqTELF6DVymejyUPFk+8l3oH3j8CybE3dAYIHqRSLkh
 8BRBKSIx794Sd2lxu3dibpsJ5GaiEDbYXCY0+8qAwynAgwq1bY9mkrTR5tPax84F0yY3
 HsXMlxNvgTyjVoah0hZHotqfjsCDIMedsBaCwnPWnMod4+DBOMaSr474cMtgxcVYs6TD
 6MWkwE8Nmsiw7otRCsT4scTLc6pQVNxWx/Kyu5S+u2Jaugo/RMXR4E7JkrtDJp4WyIAe
 F5f832Tw1ROD6jMlRDAXia4k+PhuwQoH9MLSwPxQBUwg6qOOMjAvdoqEFZEShMdBYxD4
 w8vQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:subject:date:message-id:mime-version
 :content-transfer-encoding:thread-index:content-language;
 bh=WKHCxdxxjqWgyKaA3PDdKRC1cgq/dx2USnIQhDGdWI8=;
 b=XdvO2oxZ7NVbwvK0R/x+MHxqL1lbtOtmpQ5HMWoa14jP/sHYXDCy3GsbkI5yyWFHm5
 qgGqOdUTwI47BSiNxi82880BHHdQVP+FkxVSHpEeIIyMo0IMDx3757Ru1T7OND/IJIN7
 UI+Vv4aGKr7B0J8c6Yp1iKcM0waVwxW85+NTa9VtHVr9Lfa1+Z9dSL0WUPnWnBIFDTp5
 1HtSQ8wA0AobcprP2Z0ssAr5rpkRR97t0wf/ArqBkVyUXNEwqDq5ef2GmKEi0jfqp6V+
 F8s4XJSGEZ4yk/GA37xC+Evm9Md+KYj/Qzh5jTC8xGAgb3BqTltj706L87Cr7zbfK5mh
 YdZw==
X-Gm-Message-State: AOAM532XLD0BErY3khGd/9vQblkVBOb1W/4DusnwVFSiTbLvsrAXKQpM
 jYzICj8w2+18q7eCkKAAVcnK+freHPY=
X-Google-Smtp-Source: ABdhPJyHKvXWKtX01G00vt467X0a0G4C/kp3qmpVOjRX/Y3t0tIXdHmOA8RsalTSBPkzt9aZlbCSnQ==
X-Received: by 2002:a9d:27a4:: with SMTP id c33mr14064952otb.281.1624712385566; 
 Sat, 26 Jun 2021 05:59:45 -0700 (PDT)
Received: from JRT7500 (cpe-76-85-93-15.tx.res.rr.com. [76.85.93.15])
 by smtp.gmail.com with ESMTPSA id n16sm1449640otr.30.2021.06.26.05.59.44
 for <pve-user@lists.proxmox.com>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 26 Jun 2021 05:59:45 -0700 (PDT)
From: "JR Richardson" <jmr.richardson@gmail.com>
To: <pve-user@lists.proxmox.com>
Date: Sat, 26 Jun 2021 07:59:43 -0500
Message-ID: <000001d76a8b$2271a2f0$6754e8d0$@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 16.0
Thread-Index: Addqix+y26eYSN4mSmy8N83i/QqpRQ==
Content-Language: en-us
X-SPAM-LEVEL: Spam detection results:  0
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
 DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature
 DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's
 domain
 DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from
 domain
 FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider
 RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/,
 no trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [PVE-User] BIG cluster questions
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Sat, 26 Jun 2021 12:59:53 -0000

That is a big cluster, I like it, hope it works out. You should separate the
corosync/heartbeat network on its own physical Ethernet link. This is
probably where you are getting latency from. Even though you are using 25Gig
NICs, pushing all your data/migration traffic/heartbeat traffic, across one
physical link bonded or not, you can experience situations with a busy link
where your corosync traffic is queued, even for a few milli seconds, this
will add up across many nodes. Think about jumbo frames as well, slamming a
NIC with 9000 byte packets for storage, and poor little heartbeat packets
start queueing up in the waiting pool.

In the design notes for proxmox, it's highly recommended to separate all
needed networks on physical NICs and switches as well.

Good luck.

JR Richardson
Engineering for the Masses
Chasing the Azeotrope
JRx DistillCo
1'st Place Brisket
1'st Place Chili

This is anecdotal but I have never seen one cluster that big. You might want
to inquire about professional support which would give you a better
perspective for that kind of scale.

On Thu, Jun 24, 2021 at 10:30 AM Eneko Lacunza via pve-user <
pve-user@lists.proxmox.com> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza <elacunza@binovo.es>
> To: "pve-user@pve.proxmox.com" <pve-user@pve.proxmox.com>
> Cc:
> Bcc:
> Date: Thu, 24 Jun 2021 16:30:31 +0200
> Subject: BIG cluster questions
> Hi all,
>
> We're currently helping a customer to configure a virtualization 
> cluster with 88 servers for VDI.
>