From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id CFDB572AB9 for ; Fri, 2 Jul 2021 22:58:02 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id C39BBD537 for ; Fri, 2 Jul 2021 22:58:02 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 1AE3ED526 for ; Fri, 2 Jul 2021 22:58:01 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id DC77340569; Fri, 2 Jul 2021 22:58:00 +0200 (CEST) Message-ID: <1a76ef0b-5b6e-2c2d-8702-cd889a378143@proxmox.com> Date: Fri, 2 Jul 2021 22:57:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Thunderbird/90.0 Content-Language: en-US To: Proxmox VE user list , Mark Schouten References: <5377d815-bde4-9ca8-8584-ff63a6eb27ba@proxmox.com> <0d129a03-9a70-e123-5e5a-e7862ef303ac@tuxis.nl> From: Thomas Lamprecht In-Reply-To: <0d129a03-9a70-e123-5e5a-e7862ef303ac@tuxis.nl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.537 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [PVE-User] Proxmox VE 7.0 (beta) released! X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2021 20:58:02 -0000 On 29.06.21 10:05, Mark Schouten wrote: > Hi, > > Op 24-06-2021 om 15:16 schreef Martin Maurer: >> We are pleased to announce the first beta release of Proxmox Virtual Environment 7.0! The 7.x family is based on the great Debian 11 "Bullseye" and comes with a 5.11 kernel, QEMU 6.0, LXC 4.0, OpenZFS 2.0.4. > > I just upgraded a node in our demo cluster and all seemed fine. Except for non-working cluster network. I was unable to ping the node through the cluster interface, pvecm saw no other nodes and ceph was broken. > > However, if I ran tcpdump, ping started working, but not the rest. > > Interesting situation, which I 'fixed' by disabling vlan-aware-bridge for that interface. After the reboot, everything works (AFAICS). > > If Proxmox wants to debug this, feel free to reach out to me, I can grant you access to this node so you can check it out. > FYI, there was some more investigation regarding this, mostly spear headed by Wolfgang, and we found and fixed[0] an actual, rather old (fixes commit is from 2014!), bridge bug in the kernel. The first few lines of the fix's commit message[0] explain the basics: > [..] bridges with `vlan_filtering 1` and only 1 auto-port don't > set IFF_PROMISC for unicast-filtering-capable ports. Further, we saw all that weird behavior as * while this is independent of any specific network driver, those specific drivers vary wildly in how the do things, and some thus worked (by luck) while others did not. * It can really only happen in the vlan-aware case, as else all ports are set promisc no matter what, but depending in which order things are done the result may still differ even with vlan-aware on * It did not matter before (i.e., before systemd started to also apply their MACAddressPolicy by default onto virtual devices like bridges) because then the bridge basically always had a MAC from one of it's ports, so the fdb always contained the bridge's MAC implicitly and the bug was concealed. So it's quite likely that this rather confusing mix of behaviors would had pop up in more places, where bridges are used, in the upcoming months when that systemd change slowly rolled into stable distros, so actually really nice to find and fix (*knocks wood*) this during beta! Anyhow, a newer kernel build is now also available in the bullseye based pvetest repository, if you want to test and confirm the fix: pve-kernel-5.11.22-1-pve version 5.11.22-2 cheers, Thomas [0]: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a019abd80220