From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 81E0D6029A for ; Mon, 16 Nov 2020 18:27:02 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7FDFE95BD for ; Mon, 16 Nov 2020 18:27:02 +0100 (CET) Received: from hermes.qwer.tk (hermes.qwer.tk [93.82.198.100]) by firstgate.proxmox.com (Proxmox) with ESMTP id 50E0F95B3 for ; Mon, 16 Nov 2020 18:27:01 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by hermes.qwer.tk (Postfix) with ESMTP id E1ADB188232 for ; Mon, 16 Nov 2020 18:21:30 +0100 (CET) X-Virus-Scanned: by amavisd-new at qwer.tk Received: from hermes.qwer.tk ([127.0.0.1]) by localhost (hermes.qwer.tk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I0nAvTpuRlx8 for ; Mon, 16 Nov 2020 18:21:24 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by hermes.qwer.tk (Postfix) with ESMTP id E38561882D0 for ; Mon, 16 Nov 2020 18:21:24 +0100 (CET) Received: from kalliope.qwer.tk (kalliope-air-gaberl.qwer.tk [192.168.54.75]) (Authenticated sender: hermann@himmelbauer-it.at) by hermes.qwer.tk (Postfix) with ESMTPSA id CBA4B188232 for ; Mon, 16 Nov 2020 18:21:24 +0100 (CET) To: pve-user@lists.proxmox.com References: <0e58d1d5-384b-55d2-9042-ae8c1e2ade6c@qwer.tk> From: Hermann Himmelbauer Message-ID: <23e655af-534e-3066-e3cd-514421730aa2@qwer.tk> Date: Mon, 16 Nov 2020 18:21:24 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.3 MIME-Version: 1.0 In-Reply-To: <0e58d1d5-384b-55d2-9042-ae8c1e2ade6c@qwer.tk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: de-DE Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.672 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, zenstates.py] Subject: Re: [PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Nov 2020 17:27:02 -0000 Hi, In case someone is interested, the problem is now solved, the system seems to be rock solid after ~ 2 month testing: I changed the AMD Ryzen 3 3200G to a AMD Ryzen 5 3600 on one node and to a AMD Ryzen 3 3100 on the two other nodes, now the problem is gone. I don't really know why, I can think of two reasons: 1) The 3200G did not support ECC but I use ECC RAM. Maybe this leads to errors (although intensive memory testing with memtest86 did not report anything). 2) The new CPUs do not have integrated graphic capabilities. I noticed that the two onboard 10GBit-Ethernet adapters now have other PCI addresses with the new CPU. And with the old CPUs there were problem with malfunctioning of these 10G adapters. Many thanks for input + your help. The ASRock Rack X470D4U2-2T is definitly stable now. Best Regards, Hermann Am 04.09.20 um 16:45 schrieb Hermann Himmelbauer: > Dear Proxmox users, > > I'm trying to install a 3-node cluster (latest proxmox/ceph) and > experience random freezes. The node can either be completely frozen (no > blinking cursor on console, no ping) or can get somewhat blocked / slow etc. > > This happens most often on node 2 (approx. 3-4 times / day), node 3 > never got stuck within 14 days runtime, node 1 once. > > Unfortunately I did not find any way to trigger this behaviour, however, > I *think* that this happens most often if I stress the machine in some > way (performance test within a virtual machine) and then idling the machine. > > When the machine freezes completely, there is no logfile. However, if it > is partially frozen, some info can be aquired via dmesg. (See attached > file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So > perhaps there is some driver issue regarding this ethernet adapter?) > > The system consists of the following components: > > - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX) > - ASRock Rack X470D4U2-2T (Mainboard) > - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS) > - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM > Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME) > - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply) > - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA > (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph) > - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports) > > The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve > #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux. > > What I did so far (without success): > > - Disabled C6 as I read that this CPU-state can lead to unstable systems > (via "python zenstates.py --c6-disable" -> still errors). > - Updated my Bios to the latest version (3.30) > - Checked that the CPU + RAM are compatible to the mainboard (they are > listed as compatible on the ASRock website) > - Checked logs in IPMI (undervoltage, temperature etc., nothing is logged) > - Memory test (memtest86, no errors) > > Do you have any clue what could be the reason for these freezes? Should > I think of some hardware error? Or is this some known Linux bug that can > be fixed? > > Best Regards, > Hermann > > > _______________________________________________ > pve-user mailing list > pve-user@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >