From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D55677585A for ; Thu, 14 Oct 2021 07:15:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id BF2B718D3D for ; Thu, 14 Oct 2021 07:14:47 +0200 (CEST) Received: from office.oderland.com (office.oderland.com [91.201.60.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 6ECB518D31 for ; Thu, 14 Oct 2021 07:14:46 +0200 (CEST) Received: from [193.180.18.161] (port=47722 helo=[10.137.0.14]) by office.oderland.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1mat4o-006AZU-0c; Thu, 14 Oct 2021 07:14:46 +0200 Message-ID: Date: Thu, 14 Oct 2021 07:14:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:93.0) Gecko/20100101 Thunderbird/93.0 Content-Language: en-US To: =?UTF-8?Q?VELARTIS_Philipp_D=c3=bcrhammer?= , "'pve-devel@lists.proxmox.com'" References: <2b417bee43cb4484bcba66afc6076113@velartis.at> <093EC041-0E5D-41F2-99C9-CF8A5E767313@marinov.us> <4F0DFA30-F1ED-4322-857A-4F4C24B463FE@marinov.us> <1FAB115F-FD40-41E1-AC81-A781DA29B378@marinov.us> <190901a568da4ce3a4553e6d929e6828@velartis.at> <04e7ef9a-2054-d929-fd1d-cf5f63047816@oderland.se> <554d5b7c632b47a795de25bc56a41ac6@velartis.at> <664767b8-d40b-ef60-f4f0-52b4ddbb62ff@oderland.se> From: Josef Johansson In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - office.oderland.com X-AntiAbuse: Original Domain - lists.proxmox.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - oderland.se X-Get-Message-Sender-Via: office.oderland.com: authenticated_id: josjoh@oderland.se X-Authenticated-Sender: office.oderland.com: josjoh@oderland.se X-SPAM-LEVEL: Spam detection results: 0 AWL -0.631 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_ASCII_DIVIDERS 0.8 Spam that uses ascii formatting tricks KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] BUG in vlan aware bridge X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2021 05:15:17 -0000 Hi, I did some more digging searching for 'bridge-nf-call-iptables fragmentation' Found these forum posts: https://forum.proxmox.com/threads/net-bridge-bridge-nf-call-iptables-and-friends.64766/ https://forum.proxmox.com/threads/linux-bridge-reassemble-fragmented-packets.96432/ And this patch, which seems like they at least TRIED to get it fixed ;) https://lists.linuxfoundation.org/pipermail/bridge/2019-August/012185.html Med vänliga hälsningar Josef Johansson On 10/13/21 16:32, VELARTIS Philipp Dürhammer wrote: > If you Stop pve firewall service and echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables (you stop the netfilter hook) > Then it works for me also with taged tap devices and vlan aware bridge. I think it is a kernel bug. > What I don’t understand why not more people are reporting it... > > > -----Ursprüngliche Nachricht----- > Von: Josef Johansson > Gesendet: Mittwoch, 13. Oktober 2021 16:19 > An: VELARTIS Philipp Dürhammer ; 'pve-devel@lists.proxmox.com' > Betreff: Re: AW: [pve-devel] BUG in vlan aware bridge > > Hi, > > I can confirm that s > 12000 does not work on either > > size, tap(untagged, mtu 1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000), tap(tagged, mtu1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000) > > s > 12000, doesn't work, doesn't work > > s > 8000 , works, doesn't work > > > The traffic(one packet defragmented) is just dropped between bridge and tap. I tried my NOTRACK and it didn't have any affect. > > > We have either a bug in my mellanox cards here or the kernel. I don't think this is a normal case. > > Med vänliga hälsningar > Josef Johansson > > On 10/13/21 15:53, VELARTIS Philipp Dürhammer wrote: >> And what happens if you use packet size > 9000? this should still >> work...(because it gets fragmented) >> >> -----Ursprüngliche Nachricht----- >> Von: pve-devel Im Auftrag von >> Josef Johansson >> Gesendet: Mittwoch, 13. Oktober 2021 13:37 >> An: pve-devel@lists.proxmox.com >> Betreff: Re: [pve-devel] BUG in vlan aware bridge >> >> Hi, >> >> AFAIK it's netfilter that is doing defragmenting so that it can firewall. >> >> If you specify >> >> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK >> >> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK >> >> you should be able to make it ignore your packets. >> >> >> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500. >> >> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices. >> >> Med vänliga hälsningar >> Josef Johansson >> >> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote: >>> HI, >>> >>> >>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface. >>> >>> I see this with ping -s 1500 >>> >>> On tap interface: >>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500) >>> 37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, >>> length 1480 >>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548) >>> 37.16.72.52 > 77.244.240.131: ip-proto-1 >>> >>> On vmbr0: >>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028) >>> 37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, >>> length 2008 >>> >>> On bond0 its gone.... >>> >>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter? >>> >>> -----Ursprüngliche Nachricht----- >>> Von: pve-devel Im Auftrag von >>> Stoyan Marinov >>> Gesendet: Mittwoch, 13. Oktober 2021 00:46 >>> An: Proxmox VE development discussion >>> Betreff: Re: [pve-devel] BUG in vlan aware bridge >>> >>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size). >>> >>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues: >>> >>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further. >>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm. >>> >>> I'll try to investigate a bit further tomorrow. >>> >>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov wrote: >>>> >>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones. >>>> >>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer wrote: >>>>> >>>>> HI, >>>>> >>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink >>>>> also intel) >>>>> >>>>> Also I see the I did a mistake: >>>>> >>>>> Setup working: >>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0 >>>>> >>>>> is correct. (before I had also tagged) >>>>> >>>>> it should be : >>>>> >>>>> Setup not working: >>>>> tapX (tagged) <- -> vmbr0 <- - > bond0 >>>>> >>>>> Setup working: >>>>> tapX (untagged) <- -> vmbr0 <- - > bond0 >>>>> >>>>> Setup also working: >>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0 >>>>> >>>>> -----Ursprüngliche Nachricht----- >>>>> Von: pve-devel Im Auftrag von >>>>> Stoyan Marinov >>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16 >>>>> An: Proxmox VE development discussion >>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge >>>>> >>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have? >>>>> >>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer wrote: >>>>>> >>>>>> HI, >>>>>> >>>>>> i am playing around since days because we have strange packet losses. >>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500): >>>>>> >>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device. >>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well. >>>>>> >>>>>> Setup not working: >>>>>> >>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0 >>>>>> >>>>>> Setup working: >>>>>> >>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0 >>>>>> >>>>>> Setup also working: >>>>>> >>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0 >>>>>> >>>>>> Have you got any idea where to search? I don't understand who is >>>>>> in charge of fragmenting packages again if they get reassembled by >>>>>> netfilter. (and why it is not working with vlan aware bridges) >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> pve-devel mailing list >>>>>> pve-devel@lists.proxmox.com >>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >>>>>> >>>>> _______________________________________________ >>>>> pve-devel mailing list >>>>> pve-devel@lists.proxmox.com >>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >>>>> _______________________________________________ >>>>> pve-devel mailing list >>>>> pve-devel@lists.proxmox.com >>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >>>> _______________________________________________ >>>> pve-devel mailing list >>>> pve-devel@lists.proxmox.com >>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >>> _______________________________________________ >>> pve-devel mailing list >>> pve-devel@lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >>> _______________________________________________ >>> pve-devel mailing list >>> pve-devel@lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel >> _______________________________________________ >> pve-devel mailing list >> pve-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel