From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id AF2B270EC5 for ; Sat, 26 Jun 2021 14:59:53 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A0B361DF26 for ; Sat, 26 Jun 2021 14:59:53 +0200 (CEST) Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 21BF01DF17 for ; Sat, 26 Jun 2021 14:59:53 +0200 (CEST) Received: by mail-ot1-x32e.google.com with SMTP id h24-20020a9d64180000b029036edcf8f9a6so12489834otl.3 for ; Sat, 26 Jun 2021 05:59:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding:thread-index:content-language; bh=WKHCxdxxjqWgyKaA3PDdKRC1cgq/dx2USnIQhDGdWI8=; b=tnfFQrpUbYDL+MWdwy5dpoSOqTELF6DVymejyUPFk+8l3oH3j8CybE3dAYIHqRSLkh 8BRBKSIx794Sd2lxu3dibpsJ5GaiEDbYXCY0+8qAwynAgwq1bY9mkrTR5tPax84F0yY3 HsXMlxNvgTyjVoah0hZHotqfjsCDIMedsBaCwnPWnMod4+DBOMaSr474cMtgxcVYs6TD 6MWkwE8Nmsiw7otRCsT4scTLc6pQVNxWx/Kyu5S+u2Jaugo/RMXR4E7JkrtDJp4WyIAe F5f832Tw1ROD6jMlRDAXia4k+PhuwQoH9MLSwPxQBUwg6qOOMjAvdoqEFZEShMdBYxD4 w8vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding:thread-index:content-language; bh=WKHCxdxxjqWgyKaA3PDdKRC1cgq/dx2USnIQhDGdWI8=; b=XdvO2oxZ7NVbwvK0R/x+MHxqL1lbtOtmpQ5HMWoa14jP/sHYXDCy3GsbkI5yyWFHm5 qgGqOdUTwI47BSiNxi82880BHHdQVP+FkxVSHpEeIIyMo0IMDx3757Ru1T7OND/IJIN7 UI+Vv4aGKr7B0J8c6Yp1iKcM0waVwxW85+NTa9VtHVr9Lfa1+Z9dSL0WUPnWnBIFDTp5 1HtSQ8wA0AobcprP2Z0ssAr5rpkRR97t0wf/ArqBkVyUXNEwqDq5ef2GmKEi0jfqp6V+ F8s4XJSGEZ4yk/GA37xC+Evm9Md+KYj/Qzh5jTC8xGAgb3BqTltj706L87Cr7zbfK5mh YdZw== X-Gm-Message-State: AOAM532XLD0BErY3khGd/9vQblkVBOb1W/4DusnwVFSiTbLvsrAXKQpM jYzICj8w2+18q7eCkKAAVcnK+freHPY= X-Google-Smtp-Source: ABdhPJyHKvXWKtX01G00vt467X0a0G4C/kp3qmpVOjRX/Y3t0tIXdHmOA8RsalTSBPkzt9aZlbCSnQ== X-Received: by 2002:a9d:27a4:: with SMTP id c33mr14064952otb.281.1624712385566; Sat, 26 Jun 2021 05:59:45 -0700 (PDT) Received: from JRT7500 (cpe-76-85-93-15.tx.res.rr.com. [76.85.93.15]) by smtp.gmail.com with ESMTPSA id n16sm1449640otr.30.2021.06.26.05.59.44 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 26 Jun 2021 05:59:45 -0700 (PDT) From: "JR Richardson" To: Date: Sat, 26 Jun 2021 07:59:43 -0500 Message-ID: <000001d76a8b$2271a2f0$6754e8d0$@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 16.0 Thread-Index: Addqix+y26eYSN4mSmy8N83i/QqpRQ== Content-Language: en-us X-SPAM-LEVEL: Spam detection results: 0 BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider RCVD_IN_DNSWL_NONE -0.0001 Sender listed at https://www.dnswl.org/, no trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [PVE-User] BIG cluster questions X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jun 2021 12:59:53 -0000 That is a big cluster, I like it, hope it works out. You should separate the corosync/heartbeat network on its own physical Ethernet link. This is probably where you are getting latency from. Even though you are using 25Gig NICs, pushing all your data/migration traffic/heartbeat traffic, across one physical link bonded or not, you can experience situations with a busy link where your corosync traffic is queued, even for a few milli seconds, this will add up across many nodes. Think about jumbo frames as well, slamming a NIC with 9000 byte packets for storage, and poor little heartbeat packets start queueing up in the waiting pool. In the design notes for proxmox, it's highly recommended to separate all needed networks on physical NICs and switches as well. Good luck. JR Richardson Engineering for the Masses Chasing the Azeotrope JRx DistillCo 1'st Place Brisket 1'st Place Chili This is anecdotal but I have never seen one cluster that big. You might want to inquire about professional support which would give you a better perspective for that kind of scale. On Thu, Jun 24, 2021 at 10:30 AM Eneko Lacunza via pve-user < pve-user@lists.proxmox.com> wrote: > > > > ---------- Forwarded message ---------- > From: Eneko Lacunza > To: "pve-user@pve.proxmox.com" > Cc: > Bcc: > Date: Thu, 24 Jun 2021 16:30:31 +0200 > Subject: BIG cluster questions > Hi all, > > We're currently helping a customer to configure a virtualization > cluster with 88 servers for VDI. >