public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
From: Hermann Himmelbauer <hermann@qwer.tk>
To: Wolfgang Link <w.link@proxmox.com>,
	Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system
Date: Mon, 7 Sep 2020 20:44:43 +0200	[thread overview]
Message-ID: <2be2c52e-4c86-6ca6-9f4a-49a25315c994@qwer.tk> (raw)
In-Reply-To: <2073914112.857.1599477660280@webmail.proxmox.com>

Dear Wolfgang,
Thank you for your reply. Glad to hear that the board is stable for you.

My BIOS has the default values, so no overclocking or the like. Did you
do any alterations? Did you in some way disable C6?

Maybe this is really some defect (mainboard, RAM, cpu, power supply...)
- since my posting I managed to crash node 2, however, node 1 + node 3
are stable.

BTW - did you manage to get ECC running? I do have ECC memory but it
does not seem to be detected. Maybe this is due to the AMD Ryzen 3 3200G
- I read somewhere that the CPUs with integrated graphic do not report ECC?

Can you perhaps send me the other components of your system?

The board itself + the AMD CPUs are a very price-efficient combination.
The onboard 10GBit ethernet is great for ceph, I get quite good I/O
speeds. If things get stable, it's a perfect combination for a cost
efficient HA cluster, I think.

Best Regards,
Hermann

Am 07.09.20 um 13:21 schrieb Wolfgang Link:
> Hi Hermann,
> 
> this board with this Bios version and an Ryzen 9 3900X is running perfectly over 4 month, also with very high load in the VM.
> 
> What have you set at BIOS?
> 
> Regards
> 
> Wolfgang
>> On 09/04/2020 4:45 PM Hermann Himmelbauer <hermann@qwer.tk> wrote:
>>
>>  
>> Dear Proxmox users,
>>
>> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
>> experience random freezes. The node can either be completely frozen (no
>> blinking cursor on console, no ping) or can get somewhat blocked / slow etc.
>>
>> This happens most often on node 2 (approx. 3-4 times / day), node 3
>> never got stuck within 14 days runtime, node 1 once.
>>
>> Unfortunately I did not find any way to trigger this behaviour, however,
>> I *think* that this happens most often if I stress the machine in some
>> way (performance test within a virtual machine) and then idling the machine.
>>
>> When the machine freezes completely, there is no logfile. However, if it
>> is partially frozen, some info can be aquired via dmesg. (See attached
>> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
>> perhaps there is some driver issue regarding this ethernet adapter?)
>>
>> The system consists of the following components:
>>
>> - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
>> - ASRock Rack X470D4U2-2T (Mainboard)
>> - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
>> - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
>> Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
>> - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
>> - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
>> (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
>> - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
>>
>> The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
>> #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
>>
>> What I did so far (without success):
>>
>> - Disabled C6 as I read that this CPU-state can lead to unstable systems
>> (via "python zenstates.py --c6-disable" -> still errors).
>> - Updated my Bios to the latest version (3.30)
>> - Checked that the CPU + RAM are compatible to the mainboard (they are
>> listed as compatible on the ASRock website)
>> - Checked logs in IPMI (undervoltage, temperature etc., nothing is logged)
>> - Memory test (memtest86, no errors)
>>
>> Do you have any clue what could be the reason for these freezes? Should
>> I think of some hardware error? Or is this some known Linux bug that can
>> be fixed?
>>
>> Best Regards,
>> Hermann
>>
>> -- 
>> hermann@qwer.tk
>> PGP/GPG: 299893C7 (on keyservers)
>> _______________________________________________
>> pve-user mailing list
>> pve-user@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 

-- 
hermann@qwer.tk
PGP/GPG: 299893C7 (on keyservers)



  reply	other threads:[~2020-09-07 18:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0e58d1d5-384b-55d2-9042-ae8c1e2ade6c@qwer.tk>
2020-09-07 11:21 ` Wolfgang Link
2020-09-07 18:44   ` Hermann Himmelbauer [this message]
2020-09-08  4:25     ` Wolfgang Link
2020-09-07 11:29 ` Chris Sutcliff
2020-11-16 17:21 ` Hermann Himmelbauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2be2c52e-4c86-6ca6-9f4a-49a25315c994@qwer.tk \
    --to=hermann@qwer.tk \
    --cc=pve-user@lists.proxmox.com \
    --cc=w.link@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal