From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 19D5E1FF173 for ; Mon, 25 Nov 2024 06:32:22 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D402FA264; Mon, 25 Nov 2024 06:32:25 +0100 (CET) Date: Mon, 25 Nov 2024 06:32:16 +0100 To: Proxmox VE user list In-Reply-To: References: MIME-Version: 1.0 Message-ID: List-Id: Proxmox VE user list List-Post: From: Alwin Antreich via pve-user Precedence: list Cc: Alwin Antreich X-Mailman-Version: 2.1.29 X-BeenThere: pve-user@lists.proxmox.com List-Subscribe: , List-Unsubscribe: , List-Archive: Reply-To: Proxmox VE user list List-Help: Subject: Re: [PVE-User] VMs With Multiple Interfaces Rebooting Content-Type: multipart/mixed; boundary="===============7671032088060367095==" Errors-To: pve-user-bounces@lists.proxmox.com Sender: "pve-user" --===============7671032088060367095== Content-Type: message/rfc822 Content-Disposition: inline Return-Path: X-Original-To: pve-user@lists.proxmox.com Delivered-To: pve-user@lists.proxmox.com Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id C9BCBC90F1 for ; Mon, 25 Nov 2024 06:32:24 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id AA459A20C for ; Mon, 25 Nov 2024 06:32:24 +0100 (CET) Received: from mx.antreich.com (mx.antreich.com [173.249.42.230]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 25 Nov 2024 06:32:23 +0100 (CET) Received: from mail2.antreich.com (unknown [172.16.9.25]) by mx.antreich.com (Postfix) with ESMTPS id EEF3A6E2E34 for ; Mon, 25 Nov 2024 06:32:16 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=antreich.com; s=2018; t=1732512737; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MP1y3+o++WDKhSfMpgPYSK5v2hIYsWa1lj2EtqKAiQU=; b=Q1YwviufGYLG/x9ZgQQKm1hMVa/ME/x9DrqBhPYehsCEqZ1ka1tQTjaTlc3qnlJaoqFt+C UUHWUvLDiKtwmMPLnAl6pPsvVjoLbtwDQMjntRpAwqHowqLcHV1jGciJlBY+xSY1T6by/Y tp2akgP1ZqmsxiYgiaEgnlc2YufN6TOa/QcUO1y78MVeki/Fklv9UUHbnTdBWjXII8KY6x sI6IRcv0qgjo/yw88v6dJEbsu6CJ2qL+RCi5cgRRUnoJXKk2XIUNY9URGTK6Jos26ElzNQ ZR9d4XFDrO0YXarP3FBYePNaTUfl3LuUSsjENz9grR3K1uSbmjh4hq3Cp9NBuA== Date: Mon, 25 Nov 2024 06:32:16 +0100 From: Alwin Antreich To: Proxmox VE user list Subject: Re: [PVE-User] VMs With Multiple Interfaces Rebooting In-Reply-To: References: Message-ID: <254CB7A1-E72D-442B-9956-721A4D66BEAE@antreich.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.108 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain DMARC_PASS -0.1 DMARC pass policy RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_PASS -0.001 SPF: HELO matches SPF record SPF_PASS -0.001 SPF: sender matches SPF record On November 22, 2024 7:16:53 AM GMT+01:00, JR Richardson wrote: >Hey Folks, > >Just wanted to share an experience I recently had, Cluster parameters: >7 nodes, 2 HA Groups (3 nodes and 4 nodes), shared storage=2E >Server Specs: >CPU(s) 40 x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2=2E20GHz (2 Sockets) >Kernel Version Linux 6=2E8=2E12-1-pve (2024-08-05T16:17Z) >Manager Version pve-manager/8=2E2=2E4/faa83925c9641325 > >Super stable environment for many years through software and hardware >upgrades, few issues to speak of, then without warning one of my >hypervisors in 3 node group crashed with a memory dimm error, cluster >HA took over and restarted the VMs on the other two nodes in the group >as expected=2E The problem quickly materialized as the VMs started >rebooting quickly, a lot of network issues and notice of migration >pending=2E I could not lockdown exactly what the root cause was=2E Notabl= e This sounds like it wanted to balance the load=2E Do you have CRS active a= nd/or static load scheduling? >was these particular VMs all have multiple network interfaces=2E After >several hours of not being able to get the current VMs stable, I tried >spinning up new VMs on to no avail, reboots persisted on the new VMs=2E >This seemed to only affect the VMs that were on the hypervisor that >failed all other VMs across the cluster were fine=2E > >I have not installed any third-party monitoring software, found a few >post in the forum about it, but was not my issue=2E > >In an act of desperation, I performed a dist-upgrade and this solved >the issue straight away=2E >Kernel Version Linux 6=2E8=2E12-4-pve (2024-11-06T15:04Z) >Manager Version pve-manager/8=2E3=2E0/c1689ccb1065a83b The upgrade likely restarted the pve-ha-lrm service, which could break the= migration cycle=2E The systemd logs should give you a clue to what was happening, the ha stac= k logs the actions on the given node=2E Cheers, Alwin Hi JR, --===============7671032088060367095== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pve-user mailing list pve-user@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user --===============7671032088060367095==--