From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-user-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 19D5E1FF173
	for <inbox@lore.proxmox.com>; Mon, 25 Nov 2024 06:32:22 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id D402FA264;
	Mon, 25 Nov 2024 06:32:25 +0100 (CET)
Date: Mon, 25 Nov 2024 06:32:16 +0100
To: Proxmox VE user list <pve-user@lists.proxmox.com>
In-Reply-To: <CA+U74VPYtp8uS2sC515wMHc5qc6tfjzRnRtWbxMyVtRdNTD4SQ@mail.gmail.com>
References: <CA+U74VPYtp8uS2sC515wMHc5qc6tfjzRnRtWbxMyVtRdNTD4SQ@mail.gmail.com>
MIME-Version: 1.0
Message-ID: <mailman.607.1732512744.391.pve-user@lists.proxmox.com>
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Post: <mailto:pve-user@lists.proxmox.com>
From: Alwin Antreich via pve-user <pve-user@lists.proxmox.com>
Precedence: list
Cc: Alwin Antreich <alwin@antreich.com>
X-Mailman-Version: 2.1.29
X-BeenThere: pve-user@lists.proxmox.com
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
Reply-To: Proxmox VE user list <pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
Subject: Re: [PVE-User] VMs With Multiple Interfaces Rebooting
Content-Type: multipart/mixed; boundary="===============7671032088060367095=="
Errors-To: pve-user-bounces@lists.proxmox.com
Sender: "pve-user" <pve-user-bounces@lists.proxmox.com>

--===============7671032088060367095==
Content-Type: message/rfc822
Content-Disposition: inline

Return-Path: <alwin@antreich.com>
X-Original-To: pve-user@lists.proxmox.com
Delivered-To: pve-user@lists.proxmox.com
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits))
	(No client certificate requested)
	by lists.proxmox.com (Postfix) with ESMTPS id C9BCBC90F1
	for <pve-user@lists.proxmox.com>; Mon, 25 Nov 2024 06:32:24 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id AA459A20C
	for <pve-user@lists.proxmox.com>; Mon, 25 Nov 2024 06:32:24 +0100 (CET)
Received: from mx.antreich.com (mx.antreich.com [173.249.42.230])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by firstgate.proxmox.com (Proxmox) with ESMTPS
	for <pve-user@lists.proxmox.com>; Mon, 25 Nov 2024 06:32:23 +0100 (CET)
Received: from mail2.antreich.com (unknown [172.16.9.25])
	by mx.antreich.com (Postfix) with ESMTPS id EEF3A6E2E34
	for <pve-user@lists.proxmox.com>; Mon, 25 Nov 2024 06:32:16 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=antreich.com;
	s=2018; t=1732512737;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=MP1y3+o++WDKhSfMpgPYSK5v2hIYsWa1lj2EtqKAiQU=;
	b=Q1YwviufGYLG/x9ZgQQKm1hMVa/ME/x9DrqBhPYehsCEqZ1ka1tQTjaTlc3qnlJaoqFt+C
	UUHWUvLDiKtwmMPLnAl6pPsvVjoLbtwDQMjntRpAwqHowqLcHV1jGciJlBY+xSY1T6by/Y
	tp2akgP1ZqmsxiYgiaEgnlc2YufN6TOa/QcUO1y78MVeki/Fklv9UUHbnTdBWjXII8KY6x
	sI6IRcv0qgjo/yw88v6dJEbsu6CJ2qL+RCi5cgRRUnoJXKk2XIUNY9URGTK6Jos26ElzNQ
	ZR9d4XFDrO0YXarP3FBYePNaTUfl3LuUSsjENz9grR3K1uSbmjh4hq3Cp9NBuA==
Date: Mon, 25 Nov 2024 06:32:16 +0100
From: Alwin Antreich <alwin@antreich.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] VMs With Multiple Interfaces Rebooting
In-Reply-To: <CA+U74VPYtp8uS2sC515wMHc5qc6tfjzRnRtWbxMyVtRdNTD4SQ@mail.gmail.com>
References: <CA+U74VPYtp8uS2sC515wMHc5qc6tfjzRnRtWbxMyVtRdNTD4SQ@mail.gmail.com>
Message-ID: <254CB7A1-E72D-442B-9956-721A4D66BEAE@antreich.com>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.108 Adjusted score from AWL reputation of From: address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DKIM_SIGNED               0.1 Message has a DKIM or DK signature, not necessarily valid
	DKIM_VALID               -0.1 Message has at least one valid DKIM or DK signature
	DKIM_VALID_AU            -0.1 Message has a valid DKIM or DK signature from author's domain
	DKIM_VALID_EF            -0.1 Message has a valid DKIM or DK signature from envelope-from domain
	DMARC_PASS               -0.1 DMARC pass policy
	RCVD_IN_VALIDITY_CERTIFIED_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked.  See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information.
	RCVD_IN_VALIDITY_RPBL_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked.  See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information.
	RCVD_IN_VALIDITY_SAFE_BLOCKED  0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked.  See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information.
	SPF_HELO_PASS          -0.001 SPF: HELO matches SPF record
	SPF_PASS               -0.001 SPF: sender matches SPF record

On November 22, 2024 7:16:53 AM GMT+01:00, JR Richardson <jmr=2Erichardson@=
gmail=2Ecom> wrote:
>Hey Folks,
>
>Just wanted to share an experience I recently had, Cluster parameters:
>7 nodes, 2 HA Groups (3 nodes and 4 nodes), shared storage=2E
>Server Specs:
>CPU(s) 40 x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2=2E20GHz (2 Sockets)
>Kernel Version Linux 6=2E8=2E12-1-pve (2024-08-05T16:17Z)
>Manager Version pve-manager/8=2E2=2E4/faa83925c9641325
>
>Super stable environment for many years through software and hardware
>upgrades, few issues to speak of, then without warning one of my
>hypervisors in 3 node group crashed with a memory dimm error, cluster
>HA took over and restarted the VMs on the other two nodes in the group
>as expected=2E The problem quickly materialized as the VMs started
>rebooting quickly, a lot of network issues and notice of migration
>pending=2E I could not lockdown exactly what the root cause was=2E Notabl=
e
This sounds like it wanted to balance the load=2E Do you have CRS active a=
nd/or static load scheduling?

>was these particular VMs all have multiple network interfaces=2E After
>several hours of not being able to get the current VMs stable, I tried
>spinning up new VMs on to no avail, reboots persisted on the new VMs=2E
>This seemed to only affect the VMs that were on the hypervisor that
>failed all other VMs across the cluster were fine=2E
>
>I have not installed any third-party monitoring software, found a few
>post in the forum about it, but was not my issue=2E
>
>In an act of desperation, I performed a dist-upgrade and this solved
>the issue straight away=2E
>Kernel Version Linux 6=2E8=2E12-4-pve (2024-11-06T15:04Z)
>Manager Version pve-manager/8=2E3=2E0/c1689ccb1065a83b
The upgrade likely restarted the pve-ha-lrm service, which could break the=
 migration cycle=2E

The systemd logs should give you a clue to what was happening, the ha stac=
k logs the actions on the given node=2E

Cheers,
Alwin
Hi JR,


--===============7671032088060367095==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

--===============7671032088060367095==--