all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Hermann Himmelbauer <hermann@qwer.tk>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system
Date: Mon, 16 Nov 2020 18:21:24 +0100	[thread overview]
Message-ID: <23e655af-534e-3066-e3cd-514421730aa2@qwer.tk> (raw)
In-Reply-To: <0e58d1d5-384b-55d2-9042-ae8c1e2ade6c@qwer.tk>

Hi,
In case someone is interested, the problem is now solved, the system 
seems to be rock solid after ~ 2 month testing:

I changed the AMD Ryzen 3 3200G to a AMD Ryzen 5 3600 on one node and to 
a AMD Ryzen 3 3100 on the two other nodes, now the problem is gone.

I don't really know why, I can think of two reasons:

1) The 3200G did not support ECC but I use ECC RAM. Maybe this leads to 
errors (although intensive memory testing with memtest86 did not report 
anything).
2) The new CPUs do not have integrated graphic capabilities. I noticed 
that the two onboard 10GBit-Ethernet adapters now have other PCI 
addresses with the new CPU. And with the old CPUs there were problem 
with malfunctioning of these 10G adapters.

Many thanks for input + your help.

The ASRock Rack X470D4U2-2T is definitly stable now.

Best Regards,
Hermann

Am 04.09.20 um 16:45 schrieb Hermann Himmelbauer:
> Dear Proxmox users,
> 
> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
> experience random freezes. The node can either be completely frozen (no
> blinking cursor on console, no ping) or can get somewhat blocked / slow etc.
> 
> This happens most often on node 2 (approx. 3-4 times / day), node 3
> never got stuck within 14 days runtime, node 1 once.
> 
> Unfortunately I did not find any way to trigger this behaviour, however,
> I *think* that this happens most often if I stress the machine in some
> way (performance test within a virtual machine) and then idling the machine.
> 
> When the machine freezes completely, there is no logfile. However, if it
> is partially frozen, some info can be aquired via dmesg. (See attached
> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
> perhaps there is some driver issue regarding this ethernet adapter?)
> 
> The system consists of the following components:
> 
> - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
> - ASRock Rack X470D4U2-2T (Mainboard)
> - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
> - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
> Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
> - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
> - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
> (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
> - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
> 
> The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
> #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
> 
> What I did so far (without success):
> 
> - Disabled C6 as I read that this CPU-state can lead to unstable systems
> (via "python zenstates.py --c6-disable" -> still errors).
> - Updated my Bios to the latest version (3.30)
> - Checked that the CPU + RAM are compatible to the mainboard (they are
> listed as compatible on the ASRock website)
> - Checked logs in IPMI (undervoltage, temperature etc., nothing is logged)
> - Memory test (memtest86, no errors)
> 
> Do you have any clue what could be the reason for these freezes? Should
> I think of some hardware error? Or is this some known Linux bug that can
> be fixed?
> 
> Best Regards,
> Hermann
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 



      parent reply	other threads:[~2020-11-16 17:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0e58d1d5-384b-55d2-9042-ae8c1e2ade6c@qwer.tk>
2020-09-07 11:21 ` Wolfgang Link
2020-09-07 18:44   ` Hermann Himmelbauer
2020-09-08  4:25     ` Wolfgang Link
2020-09-07 11:29 ` Chris Sutcliff
2020-11-16 17:21 ` Hermann Himmelbauer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23e655af-534e-3066-e3cd-514421730aa2@qwer.tk \
    --to=hermann@qwer.tk \
    --cc=pve-user@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal