From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 8DC787193B
 for <pve-user@lists.proxmox.com>; Tue, 29 Jun 2021 16:15:35 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 83660FC0F
 for <pve-user@lists.proxmox.com>; Tue, 29 Jun 2021 16:15:05 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 55E83FC01
 for <pve-user@lists.proxmox.com>; Tue, 29 Jun 2021 16:15:04 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1B076467F4;
 Tue, 29 Jun 2021 16:15:04 +0200 (CEST)
Message-ID: <db987d1d-a207-19ed-3cbc-87417da3ff77@proxmox.com>
Date: Tue, 29 Jun 2021 16:14:50 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101
 Thunderbird/90.0
Content-Language: en-US
To: Proxmox VE user list <pve-user@lists.proxmox.com>,
 Stoiko Ivanov <s.ivanov@proxmox.com>, Mark Schouten <mark@tuxis.nl>
References: <5377d815-bde4-9ca8-8584-ff63a6eb27ba@proxmox.com>
 <0d129a03-9a70-e123-5e5a-e7862ef303ac@tuxis.nl>
 <a602b355-4209-6e75-a25c-f7a98418d29e@proxmox.com>
 <ca6e6ebc-525d-5c7a-d481-608bba6737ed@tuxis.nl>
 <a81f3332-0e34-96b8-d195-124916547ebf@proxmox.com>
 <152e5dc5-8b0c-f182-4d85-1e1b3639209a@tuxis.nl>
 <20210629153111.2a0fbc28@rosa.proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <20210629153111.2a0fbc28@rosa.proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.571 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [PVE-User] Proxmox VE 7.0 (beta) released!
X-BeenThere: pve-user@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE user list <pve-user.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-user/>
List-Post: <mailto:pve-user@lists.proxmox.com>
List-Help: <mailto:pve-user-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>, 
 <mailto:pve-user-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 29 Jun 2021 14:15:35 -0000

On 29.06.21 15:31, Stoiko Ivanov wrote:
> On Tue, 29 Jun 2021 14:04:05 +0200
> Mark Schouten <mark@tuxis.nl> wrote:
>=20
>> Hi,
>>
>> Op 29-06-2021 om 12:31 schreef Thomas Lamprecht:
>>>> I do not completely understand why that fixes it though.=C2=A0 Comme=
nting out MACAddressPolicy=3Dpersistent helps, but why?
>>>> =20
>>>
>>> Because duplicate MAC addresses are not ideal, to say the least? =20
>>
>> That I understand. :)
>>
>> But, the cluster interface works when bridge_vlan_aware is off,=20
>> regardless of the MacAddressPolicy setting.
>>
>=20
> We managed to find a reproducer - my current guess is that it might hav=
e
> something to do with intel NIC drivers or some changes in ifupdown2 (or=

> udev, or in their interaction ;) - Sadly if tcpdump fixes the issues, i=
t
> makes debugging quite hard :)

The issue is that the kernel always (since close to forever) cleared the =
bridge's
promisc mode when there was either no port or exactly one port with flood=
 or learning
enabled in the `br_manage_promisc` function.

Further, on toggeling VLAN-aware the aforementioned `br_manage_promisc` i=
s called
from `br_vlan_filter_toggle`

So, why does this breaks now? I really do not think it's due to some driv=
er-specific
stuff, not impossible but the following sounds like a better explanation =
about the
"why now":

Previously the MAC address of the bridge was the same as the one from the=
 single port,
so there it didn't matter to much if promisc was on on the single port it=
self, the
bridge could accept the packages. But now, with the systemd default MACAd=
dresPolicy
"persistent" now also applying to bridges, the bridge gets a different MA=
C than the
port, which means the disabled promisc matters on that port quite a bit m=
ore.

So vlan-aware on "breaks" it by mistake, as then a br_manage_promisc call=
 is made
at a time where the "clear promisc for port" logic triggers, so rather a =
side-effect
than a real cause.

I quite tempted to drop the br_auto_port special case for the single port=
 case in
the kernel as fix, but need to think about this - and probably will send =
that to
LKML first to poke for some comments...