From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pve-user-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 96E111FF173
	for <inbox@lore.proxmox.com>; Mon, 25 Nov 2024 16:08:43 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 8A47916EF6;
	Mon, 25 Nov 2024 16:08:40 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1732547311; x=1733152111; darn=lists.proxmox.com;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :from:to:cc:subject:date:message-id:reply-to;
 bh=chp9mdl2gQPmasQYBIPNcgRAqoi5oWRxF7OV3AqzyRE=;
 b=SzUH+MPp6nRhYmN9Cwvjc0M2lIHn3A5lx4MJnmOTlu/skOyM4ZAnts0p6EY3NAkaKo
 AHJHthkR2RgEUiupFeMfkYOe8ZBHiHx1GNxnUdpN+U0rgzur893GlH3KfIPILbtoMXo0
 lb/eSF+JB4XxVs+5tWgw2vZGClwjmBFoOmMT2vLTCDh+hYkC9ylJyJCbmEbv23Og6Z2C
 GgqOpLL8G0f1jSMIlLR1erRGDIdpqx0TajP6LrvzXgVEChXbLQVPKodSjMalace5+FDb
 ovuNAkQ85VfXDMj1U/AvGA1cFns+9lag1WYvn1Yo2vzPJGnrRHYO1GNSWvxD4JMuEjKW
 UqHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1732547311; x=1733152111;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=chp9mdl2gQPmasQYBIPNcgRAqoi5oWRxF7OV3AqzyRE=;
 b=BktJd3+X8cyUyDqiaJgg6O1ifgsMx3zq4rqgRguuWnLMgUZVVBouMmrg3EPevf0BkX
 oEKFNzhnhTVxwUkf1FcEyX+aJG4+vgF95GneGcexwpd9ou7j8Tljtjfq4Qs6MTWMKFJ0
 coA3qj1FGl7Wzw3ejpaf7JRomc42VbXXKfoMAnf7yQCm0w1GKk+QBnx2So2DZ7zEI+tE
 qPCpydK5YlsckR3obnorZEqrpFab+czkA//EWqETZwGpOv4WKDW9soEPbzSLRbGUfFoc
 nGIrJYHQ1Wb9c2CDsSU7ocsuKlx5WPjc8pbqQrt/si5XqRHSrZC+L9CrLwpqYkPnyOpw
 RHPg==
X-Gm-Message-State: AOJu0YytuII7LcL2+7pylueBpW+rH9FNSaDKhhwk+A7Z6O/ecgusKQNH
 7Zcz867aJD/eJJilaTuVzW7L0g17Z4Eswi5LqyZ1MvMXrCIKJAlMzvRq2yQ7XafNP4DPyB0uJ16
 CvVteALmlcgNYS/dl1cURCQqYbKr/pF7F
X-Gm-Gg: ASbGnctAFzqUTHoV4CKIPZz1+bkmNFccMfitedR6fxZfEuFlch59H5oMwlf9ofOvtas
 teJtkgRAHPJtV4HObu5HNr07BWm59LNw=
X-Google-Smtp-Source: AGHT+IGFuUoCjUyVeRnfIL6YHs3YqjVKKtgET1daBrfCVTZfYA77cB7HX4M+z18S755fe7uolFJKrYW6zaYGXfgkr6U=
X-Received: by 2002:a05:6512:3f12:b0:53d:de3d:223c with SMTP id
 2adb3069b0e04-53dde3d2263mr2948302e87.19.1732547310436; Mon, 25 Nov 2024
 07:08:30 -0800 (PST)
MIME-Version: 1.0
References: <mailman.5.1732532402.36715.pve-user@lists.proxmox.com>
In-Reply-To: <mailman.5.1732532402.36715.pve-user@lists.proxmox.com>
From: JR Richardson <jmr.richardson@gmail.com>
Date: Mon, 25 Nov 2024 09:08:17 -0600
Message-ID: <CA+U74VNt=fNn2vmy3JuqNOFG8DbHWjf7HxTu3MsN7S62FFMwBw@mail.gmail.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.168 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DKIM_SIGNED               0.1 Message has a DKIM or DK signature,
 not necessarily valid
 DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature
 DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's
 domain
 DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from
 domain DMARC_PASS               -0.1 DMARC pass policy
 FREEMAIL_FROM 0.001 Sender email is commonly abused enduser mail provider
 RCVD_IN_DNSWL_NONE     -0.0001 Sender listed at https://www.dnswl.org/,
 no trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [PVE-User] VMs With Multiple Interfaces Rebooting
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox VE user list <pve-user@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pve-user-bounces@lists.proxmox.com
Sender: "pve-user" <pve-user-bounces@lists.proxmox.com>

> >Super stable environment for many years through software and hardware
> >upgrades, few issues to speak of, then without warning one of my
> >hypervisors in 3 node group crashed with a memory dimm error, cluster
> >HA took over and restarted the VMs on the other two nodes in the group
> >as expected. The problem quickly materialized as the VMs started
> >rebooting quickly, a lot of network issues and notice of migration
> >pending. I could not lockdown exactly what the root cause was. Notable
> This sounds like it wanted to balance the load. Do you have CRS active and/or static load scheduling?
CRS option is set to basic, not dynamic.

>
> >was these particular VMs all have multiple network interfaces. After
> >several hours of not being able to get the current VMs stable, I tried
> >spinning up new VMs on to no avail, reboots persisted on the new VMs.
> >This seemed to only affect the VMs that were on the hypervisor that
> >failed all other VMs across the cluster were fine.
> >
> >I have not installed any third-party monitoring software, found a few
> >post in the forum about it, but was not my issue.
> >
> >In an act of desperation, I performed a dist-upgrade and this solved
> >the issue straight away.
> >Kernel Version Linux 6.8.12-4-pve (2024-11-06T15:04Z)
> >Manager Version pve-manager/8.3.0/c1689ccb1065a83b
> The upgrade likely restarted the pve-ha-lrm service, which could break the migration cycle.
>
> The systemd logs should give you a clue to what was happening, the ha stack logs the actions on the given node.
I don't see anything particular in the lrm logs, just starting the VMs
over and over.

Here are relevant syslog entries from the end of one cycle reboot to
beginning startup.

2024-11-21T18:36:59.023578-06:00 vvepve13 qmeventd[3838]: Starting
cleanup for 13101
2024-11-21T18:36:59.105435-06:00 vvepve13 qmeventd[3838]: Finished
cleanup for 13101
2024-11-21T18:37:30.758618-06:00 vvepve13 pve-ha-lrm[1608]:
successfully acquired lock 'ha_agent_vvepve13_lock'
2024-11-21T18:37:30.758861-06:00 vvepve13 pve-ha-lrm[1608]: watchdog active
2024-11-21T18:37:30.758977-06:00 vvepve13 pve-ha-lrm[1608]: status
change wait_for_agent_lock => active
2024-11-21T18:37:30.789271-06:00 vvepve13 pve-ha-lrm[4337]: starting
service vm:13101
2024-11-21T18:37:30.808204-06:00 vvepve13 pve-ha-lrm[4338]: start VM
13101: UPID:vvepve13:000010F2:00007AEA:673FD24A:qmstart:13101:root@pam:
2024-11-21T18:37:30.808383-06:00 vvepve13 pve-ha-lrm[4337]: <root@pam>
starting task UPID:vvepve13:000010F2:00007AEA:673FD24A:qmstart:13101:root@pam:
2024-11-21T18:37:31.112154-06:00 vvepve13 systemd[1]: Started 13101.scope.
2024-11-21T18:37:32.802414-06:00 vvepve13 kernel: [  316.379944]
tap13101i0: entered promiscuous mode
2024-11-21T18:37:32.846352-06:00 vvepve13 kernel: [  316.423935]
vmbr0: port 10(tap13101i0) entered blocking state
2024-11-21T18:37:32.846372-06:00 vvepve13 kernel: [  316.423946]
vmbr0: port 10(tap13101i0) entered disabled state
2024-11-21T18:37:32.846375-06:00 vvepve13 kernel: [  316.423990]
tap13101i0: entered allmulticast mode
2024-11-21T18:37:32.847377-06:00 vvepve13 kernel: [  316.424825]
vmbr0: port 10(tap13101i0) entered blocking state
2024-11-21T18:37:32.847391-06:00 vvepve13 kernel: [  316.424832]
vmbr0: port 10(tap13101i0) entered forwarding state
2024-11-21T18:37:34.594397-06:00 vvepve13 kernel: [  318.172029]
tap13101i1: entered promiscuous mode
2024-11-21T18:37:34.640376-06:00 vvepve13 kernel: [  318.217302]
vmbr0: port 11(tap13101i1) entered blocking state
2024-11-21T18:37:34.640393-06:00 vvepve13 kernel: [  318.217310]
vmbr0: port 11(tap13101i1) entered disabled state
2024-11-21T18:37:34.640396-06:00 vvepve13 kernel: [  318.217341]
tap13101i1: entered allmulticast mode
2024-11-21T18:37:34.640398-06:00 vvepve13 kernel: [  318.218073]
vmbr0: port 11(tap13101i1) entered blocking state
2024-11-21T18:37:34.640400-06:00 vvepve13 kernel: [  318.218077]
vmbr0: port 11(tap13101i1) entered forwarding state
2024-11-21T18:37:35.819630-06:00 vvepve13 pve-ha-lrm[4337]: Task
'UPID:vvepve13:000010F2:00007AEA:673FD24A:qmstart:13101:root@pam:'
still active, waiting
2024-11-21T18:37:36.249349-06:00 vvepve13 kernel: [  319.827024]
tap13101i2: entered promiscuous mode
2024-11-21T18:37:36.291346-06:00 vvepve13 kernel: [  319.868406]
vmbr0: port 12(tap13101i2) entered blocking state
2024-11-21T18:37:36.291365-06:00 vvepve13 kernel: [  319.868417]
vmbr0: port 12(tap13101i2) entered disabled state
2024-11-21T18:37:36.291367-06:00 vvepve13 kernel: [  319.868443]
tap13101i2: entered allmulticast mode
2024-11-21T18:37:36.291368-06:00 vvepve13 kernel: [  319.869185]
vmbr0: port 12(tap13101i2) entered blocking state
2024-11-21T18:37:36.291369-06:00 vvepve13 kernel: [  319.869191]
vmbr0: port 12(tap13101i2) entered forwarding state
2024-11-21T18:37:37.997394-06:00 vvepve13 kernel: [  321.575034]
tap13101i3: entered promiscuous mode
2024-11-21T18:37:38.040384-06:00 vvepve13 kernel: [  321.617225]
vmbr0: port 13(tap13101i3) entered blocking state
2024-11-21T18:37:38.040396-06:00 vvepve13 kernel: [  321.617236]
vmbr0: port 13(tap13101i3) entered disabled state
2024-11-21T18:37:38.040400-06:00 vvepve13 kernel: [  321.617278]
tap13101i3: entered allmulticast mode
2024-11-21T18:37:38.040402-06:00 vvepve13 kernel: [  321.618070]
vmbr0: port 13(tap13101i3) entered blocking state
2024-11-21T18:37:38.040403-06:00 vvepve13 kernel: [  321.618077]
vmbr0: port 13(tap13101i3) entered forwarding state
2024-11-21T18:37:38.248094-06:00 vvepve13 pve-ha-lrm[4337]: <root@pam>
end task UPID:vvepve13:000010F2:00007AEA:673FD24A:qmstart:13101:root@pam:
OK
2024-11-21T18:37:38.254144-06:00 vvepve13 pve-ha-lrm[4337]: service
status vm:13101 started
2024-11-21T18:37:44.256824-06:00 vvepve13 QEMU[3794]: kvm:
../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret
== 0' failed.
2024-11-21T18:38:17.486394-06:00 vvepve13 kernel: [  361.063298]
vmbr0: port 10(tap13101i0) entered disabled state
2024-11-21T18:38:17.486423-06:00 vvepve13 kernel: [  361.064099]
tap13101i0 (unregistering): left allmulticast mode
2024-11-21T18:38:17.486426-06:00 vvepve13 kernel: [  361.064110]
vmbr0: port 10(tap13101i0) entered disabled state
2024-11-21T18:38:17.510386-06:00 vvepve13 kernel: [  361.087517]
vmbr0: port 11(tap13101i1) entered disabled state
2024-11-21T18:38:17.510400-06:00 vvepve13 kernel: [  361.087796]
tap13101i1 (unregistering): left allmulticast mode
2024-11-21T18:38:17.510403-06:00 vvepve13 kernel: [  361.087805]
vmbr0: port 11(tap13101i1) entered disabled state
2024-11-21T18:38:17.540386-06:00 vvepve13 kernel: [  361.117511]
vmbr0: port 12(tap13101i2) entered disabled state
2024-11-21T18:38:17.540402-06:00 vvepve13 kernel: [  361.117817]
tap13101i2 (unregistering): left allmulticast mode
2024-11-21T18:38:17.540404-06:00 vvepve13 kernel: [  361.117827]
vmbr0: port 12(tap13101i2) entered disabled state
2024-11-21T18:38:17.561380-06:00 vvepve13 kernel: [  361.138518]
vmbr0: port 13(tap13101i3) entered disabled state
2024-11-21T18:38:17.561394-06:00 vvepve13 kernel: [  361.138965]
tap13101i3 (unregistering): left allmulticast mode
2024-11-21T18:38:17.561399-06:00 vvepve13 kernel: [  361.138977]
vmbr0: port 13(tap13101i3) entered disabled state
2024-11-21T18:38:17.584412-06:00 vvepve13 systemd[1]: 13101.scope:
Deactivated successfully.
2024-11-21T18:38:17.584619-06:00 vvepve13 systemd[1]: 13101.scope:
Consumed 51.122s CPU time.
2024-11-21T18:38:18.522886-06:00 vvepve13 pvestatd[1476]: VM 13101 qmp
command failed - VM 13101 not running
2024-11-21T18:38:18.523725-06:00 vvepve13 pve-ha-lrm[4889]: <root@pam>
end task UPID:vvepve13:0000131A:00008A78:673FD272:qmstart:13104:root@pam:
OK
2024-11-21T18:38:18.945142-06:00 vvepve13 qmeventd[4990]: Starting
cleanup for 13101
2024-11-21T18:38:19.022405-06:00 vvepve13 qmeventd[4990]: Finished
cleanup for 13101


Thanks

JR

_______________________________________________
pve-user mailing list
pve-user@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user