public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] BUG in vlan aware bridge
@ 2021-10-12 10:48 VELARTIS Philipp Dürhammer
  2021-10-12 11:16 ` Stoyan Marinov
  0 siblings, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-12 10:48 UTC (permalink / raw)
  To: 'Proxmox VE development discussion'

HI,

i am playing around since days because we have strange packet losses.
Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):

Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.

Setup not working:

tapX (tagged) <- -> vmbr0 <- - > bond0

Setup working:

tapX (tagged) <- -> vmbr0 <- - > bond0

Setup also working:

tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0

Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 10:48 [pve-devel] BUG in vlan aware bridge VELARTIS Philipp Dürhammer
@ 2021-10-12 11:16 ` Stoyan Marinov
  2021-10-12 12:36   ` VELARTIS Philipp Dürhammer
  0 siblings, 1 reply; 15+ messages in thread
From: Stoyan Marinov @ 2021-10-12 11:16 UTC (permalink / raw)
  To: Proxmox VE development discussion

I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?

> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
> 
> HI,
> 
> i am playing around since days because we have strange packet losses.
> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
> 
> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
> 
> Setup not working:
> 
> tapX (tagged) <- -> vmbr0 <- - > bond0
> 
> Setup working:
> 
> tapX (tagged) <- -> vmbr0 <- - > bond0
> 
> Setup also working:
> 
> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
> 
> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 11:16 ` Stoyan Marinov
@ 2021-10-12 12:36   ` VELARTIS Philipp Dürhammer
  2021-10-12 20:26     ` Stoyan Marinov
  0 siblings, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-12 12:36 UTC (permalink / raw)
  To: Proxmox VE development discussion

HI,

we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)

Also I see the I did a mistake:

Setup working:
tapX (UNtagged) <- -> vmbr0 <- - > bond0

is correct. (before I had also tagged) 

it should be :

Setup not working:
 tapX (tagged) <- -> vmbr0 <- - > bond0

Setup working:
tapX (untagged) <- -> vmbr0 <- - > bond0

Setup also working:
tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0

-----Ursprüngliche Nachricht-----
Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
Gesendet: Dienstag, 12. Oktober 2021 13:16
An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Betreff: Re: [pve-devel] BUG in vlan aware bridge

I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?

> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
> 
> HI,
> 
> i am playing around since days because we have strange packet losses.
> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
> 
> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
> 
> Setup not working:
> 
> tapX (tagged) <- -> vmbr0 <- - > bond0
> 
> Setup working:
> 
> tapX (tagged) <- -> vmbr0 <- - > bond0
> 
> Setup also working:
> 
> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
> 
> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 12:36   ` VELARTIS Philipp Dürhammer
@ 2021-10-12 20:26     ` Stoyan Marinov
  2021-10-12 22:45       ` Stoyan Marinov
  0 siblings, 1 reply; 15+ messages in thread
From: Stoyan Marinov @ 2021-10-12 20:26 UTC (permalink / raw)
  To: Proxmox VE development discussion

That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.

> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
> 
> HI,
> 
> we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)
> 
> Also I see the I did a mistake:
> 
> Setup working:
> tapX (UNtagged) <- -> vmbr0 <- - > bond0
> 
> is correct. (before I had also tagged) 
> 
> it should be :
> 
> Setup not working:
> tapX (tagged) <- -> vmbr0 <- - > bond0
> 
> Setup working:
> tapX (untagged) <- -> vmbr0 <- - > bond0
> 
> Setup also working:
> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
> 
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
> Gesendet: Dienstag, 12. Oktober 2021 13:16
> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
> 
> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
> 
>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>> 
>> HI,
>> 
>> i am playing around since days because we have strange packet losses.
>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>> 
>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>> 
>> Setup not working:
>> 
>> tapX (tagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup working:
>> 
>> tapX (tagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup also working:
>> 
>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>> 
>> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 20:26     ` Stoyan Marinov
@ 2021-10-12 22:45       ` Stoyan Marinov
  2021-10-12 23:03         ` Stoyan Marinov
  2021-10-13  9:22         ` VELARTIS Philipp Dürhammer
  0 siblings, 2 replies; 15+ messages in thread
From: Stoyan Marinov @ 2021-10-12 22:45 UTC (permalink / raw)
  To: Proxmox VE development discussion

OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).

I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:

1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.

I'll try to investigate a bit further tomorrow.

> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
> 
> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
> 
>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>> 
>> HI,
>> 
>> we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)
>> 
>> Also I see the I did a mistake:
>> 
>> Setup working:
>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>> 
>> is correct. (before I had also tagged) 
>> 
>> it should be :
>> 
>> Setup not working:
>> tapX (tagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup working:
>> tapX (untagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup also working:
>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>> 
>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>> 
>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>> 
>>> HI,
>>> 
>>> i am playing around since days because we have strange packet losses.
>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>> 
>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>> 
>>> Setup not working:
>>> 
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup working:
>>> 
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup also working:
>>> 
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>> 
>>> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
>>> 
>>> 
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> 
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 22:45       ` Stoyan Marinov
@ 2021-10-12 23:03         ` Stoyan Marinov
  2021-10-13  9:22         ` VELARTIS Philipp Dürhammer
  1 sibling, 0 replies; 15+ messages in thread
From: Stoyan Marinov @ 2021-10-12 23:03 UTC (permalink / raw)
  To: Proxmox VE development discussion

Alright, I said tomorrow, but I did a bit more fiddling with tcpdump and that's quite odd:
01:59:31.462866 d2:24:de:13:3d:6e > c6:29:5b:54:e3:b9, ethertype IPv4 (0x0800), length 1514: 10.3.4.111 > 192.168.0.220: ICMP echo reply, id 112, seq 29, length 1480
01:59:31.462866 d2:24:de:13:3d:6e > c6:29:5b:54:e3:b9, ethertype 802.1Q (0x8100), length 566: vlan 10, p 0, ethertype IPv4, 10.3.4.111 > 192.168.0.220: ip-proto-1
01:59:32.486719 d2:24:de:13:3d:6e > c6:29:5b:54:e3:b9, ethertype IPv4 (0x0800), length 1514: 10.3.4.111 > 192.168.0.220: ICMP echo reply, id 112, seq 30, length 1480
01:59:32.486719 d2:24:de:13:3d:6e > c6:29:5b:54:e3:b9, ethertype 802.1Q (0x8100), length 566: vlan 10, p 0, ethertype IPv4, 10.3.4.111 > 192.168.0.220: ip-proto-1

It seems like the first fragment arrives properly and the 2nd one is a vlan tagged ethernet frame. That's weird.

On top of all I noticed that if I run a tcpdump on proxmox host, listening on vmbr0 - it all works!


> On 13 Oct 2021, at 1:45 AM, Stoyan Marinov <stoyan@marinov.us> wrote:
> 
> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
> 
> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
> 
> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
> 
> I'll try to investigate a bit further tomorrow.
> 
>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>> 
>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>> 
>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>> 
>>> HI,
>>> 
>>> we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)
>>> 
>>> Also I see the I did a mistake:
>>> 
>>> Setup working:
>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>> 
>>> is correct. (before I had also tagged) 
>>> 
>>> it should be :
>>> 
>>> Setup not working:
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup working:
>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup also working:
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>> 
>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>> 
>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>> 
>>>> HI,
>>>> 
>>>> i am playing around since days because we have strange packet losses.
>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>> 
>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>> 
>>>> Setup not working:
>>>> 
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>> 
>>>> Setup working:
>>>> 
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>> 
>>>> Setup also working:
>>>> 
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>> 
>>>> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
>>>> 
>>>> 
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-12 22:45       ` Stoyan Marinov
  2021-10-12 23:03         ` Stoyan Marinov
@ 2021-10-13  9:22         ` VELARTIS Philipp Dürhammer
  2021-10-13 11:36           ` Josef Johansson
  1 sibling, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-13  9:22 UTC (permalink / raw)
  To: 'Proxmox VE development discussion'

HI,


Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.

I see this with ping -s 1500

On tap interface: 
11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
    37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, length 1480
11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
    37.16.72.52 > 77.244.240.131: ip-proto-1

On vmbr0:
11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
    37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, length 2008

On bond0 its gone....

But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?

-----Ursprüngliche Nachricht-----
Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
Gesendet: Mittwoch, 13. Oktober 2021 00:46
An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Betreff: Re: [pve-devel] BUG in vlan aware bridge

OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).

I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:

1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.

I'll try to investigate a bit further tomorrow.

> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
> 
> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
> 
>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>> 
>> HI,
>> 
>> we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)
>> 
>> Also I see the I did a mistake:
>> 
>> Setup working:
>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>> 
>> is correct. (before I had also tagged) 
>> 
>> it should be :
>> 
>> Setup not working:
>> tapX (tagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup working:
>> tapX (untagged) <- -> vmbr0 <- - > bond0
>> 
>> Setup also working:
>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>> 
>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>> 
>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>> 
>>> HI,
>>> 
>>> i am playing around since days because we have strange packet losses.
>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>> 
>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>> 
>>> Setup not working:
>>> 
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup working:
>>> 
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>> 
>>> Setup also working:
>>> 
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>> 
>>> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
>>> 
>>> 
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> 
>> 
>> 
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13  9:22         ` VELARTIS Philipp Dürhammer
@ 2021-10-13 11:36           ` Josef Johansson
  2021-10-13 13:47             ` VELARTIS Philipp Dürhammer
  2021-10-13 13:53             ` VELARTIS Philipp Dürhammer
  0 siblings, 2 replies; 15+ messages in thread
From: Josef Johansson @ 2021-10-13 11:36 UTC (permalink / raw)
  To: pve-devel

Hi,

AFAIK it's netfilter that is doing defragmenting so that it can firewall.

If you specify

iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK

iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK

you should be able to make it ignore your packets.


As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000
vlan-aware bridges with firewalls to another MTU 1500.

As you would assume the package is defragmented over MTU 9000 links and
fragmented again over MTU 1500 devices.

Med vänliga hälsningar
Josef Johansson

On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
> HI,
>
>
> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>
> I see this with ping -s 1500
>
> On tap interface: 
> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, length 1480
> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>     37.16.72.52 > 77.244.240.131: ip-proto-1
>
> On vmbr0:
> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, length 2008
>
> On bond0 its gone....
>
> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
> Gesendet: Mittwoch, 13. Oktober 2021 00:46
> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>
> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>
> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>
> I'll try to investigate a bit further tomorrow.
>
>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>
>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>
>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>
>>> HI,
>>>
>>> we use HP Server with Intel Cards or the standard hp nic ( ithink also intel)
>>>
>>> Also I see the I did a mistake:
>>>
>>> Setup working:
>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>
>>> is correct. (before I had also tagged) 
>>>
>>> it should be :
>>>
>>> Setup not working:
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup working:
>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup also working:
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Stoyan Marinov
>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>
>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>
>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> i am playing around since days because we have strange packet losses.
>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>
>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>
>>>> Setup not working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>>
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> Have you got any idea where to search? I don't understand who is in charge of fragmenting packages again if they get reassembled by netfilter. (and why it is not working with vlan aware bridges)
>>>>
>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 11:36           ` Josef Johansson
@ 2021-10-13 13:47             ` VELARTIS Philipp Dürhammer
  2021-10-13 13:53               ` Josef Johansson
  2021-10-13 13:53             ` VELARTIS Philipp Dürhammer
  1 sibling, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-13 13:47 UTC (permalink / raw)
  To: 'pve-devel@lists.proxmox.com'

>> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.

>> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.

So you did a ping with -s 2000 (or bigger) and your tap device is vlan tagged from the vm where you ping?

-----Ursprüngliche Nachricht-----
Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Josef Johansson
Gesendet: Mittwoch, 13. Oktober 2021 13:37
An: pve-devel@lists.proxmox.com
Betreff: Re: [pve-devel] BUG in vlan aware bridge

Hi,

AFAIK it's netfilter that is doing defragmenting so that it can firewall.

If you specify

iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK

iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK

you should be able to make it ignore your packets.


As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.

As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.

Med vänliga hälsningar
Josef Johansson

On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
> HI,
>
>
> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>
> I see this with ping -s 1500
>
> On tap interface: 
> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
> length 1480
> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>     37.16.72.52 > 77.244.240.131: ip-proto-1
>
> On vmbr0:
> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
> length 2008
>
> On bond0 its gone....
>
> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
> Stoyan Marinov
> Gesendet: Mittwoch, 13. Oktober 2021 00:46
> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>
> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>
> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>
> I'll try to investigate a bit further tomorrow.
>
>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>
>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>
>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>
>>> HI,
>>>
>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>> also intel)
>>>
>>> Also I see the I did a mistake:
>>>
>>> Setup working:
>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>
>>> is correct. (before I had also tagged)
>>>
>>> it should be :
>>>
>>> Setup not working:
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup working:
>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup also working:
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>> Stoyan Marinov
>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>
>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>
>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> i am playing around since days because we have strange packet losses.
>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>
>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>
>>>> Setup not working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>>
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> Have you got any idea where to search? I don't understand who is in 
>>>> charge of fragmenting packages again if they get reassembled by 
>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>
>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 13:47             ` VELARTIS Philipp Dürhammer
@ 2021-10-13 13:53               ` Josef Johansson
  0 siblings, 0 replies; 15+ messages in thread
From: Josef Johansson @ 2021-10-13 13:53 UTC (permalink / raw)
  To: pve-devel


Med vänliga hälsningar
Josef Johansson

On 10/13/21 15:47, VELARTIS Philipp Dürhammer wrote:
>>> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>>> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
> So you did a ping with -s 2000 (or bigger) and your tap device is vlan tagged from the vm where you ping?
Oh right. I have to test that out correctly. I have it lab, will reach
back to you when I've tested it properly.
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Josef Johansson
> Gesendet: Mittwoch, 13. Oktober 2021 13:37
> An: pve-devel@lists.proxmox.com
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> Hi,
>
> AFAIK it's netfilter that is doing defragmenting so that it can firewall.
>
> If you specify
>
> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK
>
> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK
>
> you should be able to make it ignore your packets.
>
>
> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>
> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
>
> Med vänliga hälsningar
> Josef Johansson
>
> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
>> HI,
>>
>>
>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>>
>> I see this with ping -s 1500
>>
>> On tap interface: 
>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 1480
>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>>     37.16.72.52 > 77.244.240.131: ip-proto-1
>>
>> On vmbr0:
>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 2008
>>
>> On bond0 its gone....
>>
>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>>
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>> Stoyan Marinov
>> Gesendet: Mittwoch, 13. Oktober 2021 00:46
>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>
>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>>
>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>>
>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>>
>> I'll try to investigate a bit further tomorrow.
>>
>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>>
>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>>
>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>>> also intel)
>>>>
>>>> Also I see the I did a mistake:
>>>>
>>>> Setup working:
>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> is correct. (before I had also tagged)
>>>>
>>>> it should be :
>>>>
>>>> Setup not working:
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>> Stoyan Marinov
>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>
>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>>
>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>
>>>>> HI,
>>>>>
>>>>> i am playing around since days because we have strange packet losses.
>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>>
>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>>
>>>>> Setup not working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup also working:
>>>>>
>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>
>>>>> Have you got any idea where to search? I don't understand who is in 
>>>>> charge of fragmenting packages again if they get reassembled by 
>>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 11:36           ` Josef Johansson
  2021-10-13 13:47             ` VELARTIS Philipp Dürhammer
@ 2021-10-13 13:53             ` VELARTIS Philipp Dürhammer
  2021-10-13 14:19               ` Josef Johansson
  1 sibling, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-13 13:53 UTC (permalink / raw)
  To: 'pve-devel@lists.proxmox.com'

And what happens if you use packet size > 9000? this should still work...(because it gets fragmented)

-----Ursprüngliche Nachricht-----
Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Josef Johansson
Gesendet: Mittwoch, 13. Oktober 2021 13:37
An: pve-devel@lists.proxmox.com
Betreff: Re: [pve-devel] BUG in vlan aware bridge

Hi,

AFAIK it's netfilter that is doing defragmenting so that it can firewall.

If you specify

iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK

iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK

you should be able to make it ignore your packets.


As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.

As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.

Med vänliga hälsningar
Josef Johansson

On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
> HI,
>
>
> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>
> I see this with ping -s 1500
>
> On tap interface: 
> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
> length 1480
> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>     37.16.72.52 > 77.244.240.131: ip-proto-1
>
> On vmbr0:
> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
> length 2008
>
> On bond0 its gone....
>
> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
> Stoyan Marinov
> Gesendet: Mittwoch, 13. Oktober 2021 00:46
> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>
> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>
> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>
> I'll try to investigate a bit further tomorrow.
>
>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>
>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>
>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>
>>> HI,
>>>
>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>> also intel)
>>>
>>> Also I see the I did a mistake:
>>>
>>> Setup working:
>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>
>>> is correct. (before I had also tagged)
>>>
>>> it should be :
>>>
>>> Setup not working:
>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup working:
>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>
>>> Setup also working:
>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>> Stoyan Marinov
>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>
>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>
>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> i am playing around since days because we have strange packet losses.
>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>
>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>
>>>> Setup not working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>>
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>>
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> Have you got any idea where to search? I don't understand who is in 
>>>> charge of fragmenting packages again if they get reassembled by 
>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>
>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 13:53             ` VELARTIS Philipp Dürhammer
@ 2021-10-13 14:19               ` Josef Johansson
  2021-10-13 14:32                 ` VELARTIS Philipp Dürhammer
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Johansson @ 2021-10-13 14:19 UTC (permalink / raw)
  To: VELARTIS Philipp Dürhammer, 'pve-devel@lists.proxmox.com'

Hi,

I can confirm that s > 12000 does not work on either

size, tap(untagged, mtu 1500)->vlan-aware bridge(mtu 9000)->bond(mtu
9000), tap(tagged, mtu1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000)

s > 12000, doesn't work, doesn't work

s > 8000 , works, doesn't work


The traffic(one packet defragmented) is just dropped between bridge and
tap. I tried my NOTRACK and it didn't have any affect.


We have either a bug in my mellanox cards here or the kernel. I don't
think this is a normal case.

Med vänliga hälsningar
Josef Johansson

On 10/13/21 15:53, VELARTIS Philipp Dürhammer wrote:
> And what happens if you use packet size > 9000? this should still work...(because it gets fragmented)
>
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von Josef Johansson
> Gesendet: Mittwoch, 13. Oktober 2021 13:37
> An: pve-devel@lists.proxmox.com
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> Hi,
>
> AFAIK it's netfilter that is doing defragmenting so that it can firewall.
>
> If you specify
>
> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK
>
> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK
>
> you should be able to make it ignore your packets.
>
>
> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>
> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
>
> Med vänliga hälsningar
> Josef Johansson
>
> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
>> HI,
>>
>>
>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>>
>> I see this with ping -s 1500
>>
>> On tap interface: 
>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 1480
>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>>     37.16.72.52 > 77.244.240.131: ip-proto-1
>>
>> On vmbr0:
>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 2008
>>
>> On bond0 its gone....
>>
>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>>
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>> Stoyan Marinov
>> Gesendet: Mittwoch, 13. Oktober 2021 00:46
>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>
>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>>
>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>>
>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>>
>> I'll try to investigate a bit further tomorrow.
>>
>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>>
>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>>
>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>>> also intel)
>>>>
>>>> Also I see the I did a mistake:
>>>>
>>>> Setup working:
>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> is correct. (before I had also tagged)
>>>>
>>>> it should be :
>>>>
>>>> Setup not working:
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>> Stoyan Marinov
>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>
>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>>
>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>
>>>>> HI,
>>>>>
>>>>> i am playing around since days because we have strange packet losses.
>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>>
>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>>
>>>>> Setup not working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup also working:
>>>>>
>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>
>>>>> Have you got any idea where to search? I don't understand who is in 
>>>>> charge of fragmenting packages again if they get reassembled by 
>>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 14:19               ` Josef Johansson
@ 2021-10-13 14:32                 ` VELARTIS Philipp Dürhammer
  2021-10-14  5:14                   ` Josef Johansson
  0 siblings, 1 reply; 15+ messages in thread
From: VELARTIS Philipp Dürhammer @ 2021-10-13 14:32 UTC (permalink / raw)
  To: 'Josef Johansson', 'pve-devel@lists.proxmox.com'

If you Stop pve firewall service and echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables (you stop the netfilter hook)
Then it works for me also with taged tap devices and vlan aware bridge. I think it is a kernel bug.
What I don’t understand why not more people are reporting it...


-----Ursprüngliche Nachricht-----
Von: Josef Johansson <josef@oderland.se> 
Gesendet: Mittwoch, 13. Oktober 2021 16:19
An: VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at>; 'pve-devel@lists.proxmox.com' <pve-devel@lists.proxmox.com>
Betreff: Re: AW: [pve-devel] BUG in vlan aware bridge

Hi,

I can confirm that s > 12000 does not work on either

size, tap(untagged, mtu 1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000), tap(tagged, mtu1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000)

s > 12000, doesn't work, doesn't work

s > 8000 , works, doesn't work


The traffic(one packet defragmented) is just dropped between bridge and tap. I tried my NOTRACK and it didn't have any affect.


We have either a bug in my mellanox cards here or the kernel. I don't think this is a normal case.

Med vänliga hälsningar
Josef Johansson

On 10/13/21 15:53, VELARTIS Philipp Dürhammer wrote:
> And what happens if you use packet size > 9000? this should still 
> work...(because it gets fragmented)
>
> -----Ursprüngliche Nachricht-----
> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
> Josef Johansson
> Gesendet: Mittwoch, 13. Oktober 2021 13:37
> An: pve-devel@lists.proxmox.com
> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>
> Hi,
>
> AFAIK it's netfilter that is doing defragmenting so that it can firewall.
>
> If you specify
>
> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK
>
> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK
>
> you should be able to make it ignore your packets.
>
>
> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>
> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
>
> Med vänliga hälsningar
> Josef Johansson
>
> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
>> HI,
>>
>>
>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>>
>> I see this with ping -s 1500
>>
>> On tap interface: 
>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 1480
>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>>     37.16.72.52 > 77.244.240.131: ip-proto-1
>>
>> On vmbr0:
>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>> length 2008
>>
>> On bond0 its gone....
>>
>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>>
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>> Stoyan Marinov
>> Gesendet: Mittwoch, 13. Oktober 2021 00:46
>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>
>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>>
>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>>
>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>>
>> I'll try to investigate a bit further tomorrow.
>>
>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>>
>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>>
>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>
>>>> HI,
>>>>
>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>>> also intel)
>>>>
>>>> Also I see the I did a mistake:
>>>>
>>>> Setup working:
>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> is correct. (before I had also tagged)
>>>>
>>>> it should be :
>>>>
>>>> Setup not working:
>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup working:
>>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>>
>>>> Setup also working:
>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>> Stoyan Marinov
>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>
>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>>
>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>
>>>>> HI,
>>>>>
>>>>> i am playing around since days because we have strange packet losses.
>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>>
>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>>
>>>>> Setup not working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup working:
>>>>>
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup also working:
>>>>>
>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>
>>>>> Have you got any idea where to search? I don't understand who is 
>>>>> in charge of fragmenting packages again if they get reassembled by 
>>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-13 14:32                 ` VELARTIS Philipp Dürhammer
@ 2021-10-14  5:14                   ` Josef Johansson
  2021-10-14  5:40                     ` Josef Johansson
  0 siblings, 1 reply; 15+ messages in thread
From: Josef Johansson @ 2021-10-14  5:14 UTC (permalink / raw)
  To: VELARTIS Philipp Dürhammer, 'pve-devel@lists.proxmox.com'

Hi,

I did some more digging searching for 'bridge-nf-call-iptables
fragmentation'

Found these forum posts:

https://forum.proxmox.com/threads/net-bridge-bridge-nf-call-iptables-and-friends.64766/

https://forum.proxmox.com/threads/linux-bridge-reassemble-fragmented-packets.96432/

And this patch, which seems like they at least TRIED to get it fixed ;)

https://lists.linuxfoundation.org/pipermail/bridge/2019-August/012185.html

Med vänliga hälsningar
Josef Johansson

On 10/13/21 16:32, VELARTIS Philipp Dürhammer wrote:
> If you Stop pve firewall service and echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables (you stop the netfilter hook)
> Then it works for me also with taged tap devices and vlan aware bridge. I think it is a kernel bug.
> What I don’t understand why not more people are reporting it...
>
>
> -----Ursprüngliche Nachricht-----
> Von: Josef Johansson <josef@oderland.se> 
> Gesendet: Mittwoch, 13. Oktober 2021 16:19
> An: VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at>; 'pve-devel@lists.proxmox.com' <pve-devel@lists.proxmox.com>
> Betreff: Re: AW: [pve-devel] BUG in vlan aware bridge
>
> Hi,
>
> I can confirm that s > 12000 does not work on either
>
> size, tap(untagged, mtu 1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000), tap(tagged, mtu1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000)
>
> s > 12000, doesn't work, doesn't work
>
> s > 8000 , works, doesn't work
>
>
> The traffic(one packet defragmented) is just dropped between bridge and tap. I tried my NOTRACK and it didn't have any affect.
>
>
> We have either a bug in my mellanox cards here or the kernel. I don't think this is a normal case.
>
> Med vänliga hälsningar
> Josef Johansson
>
> On 10/13/21 15:53, VELARTIS Philipp Dürhammer wrote:
>> And what happens if you use packet size > 9000? this should still 
>> work...(because it gets fragmented)
>>
>> -----Ursprüngliche Nachricht-----
>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>> Josef Johansson
>> Gesendet: Mittwoch, 13. Oktober 2021 13:37
>> An: pve-devel@lists.proxmox.com
>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>
>> Hi,
>>
>> AFAIK it's netfilter that is doing defragmenting so that it can firewall.
>>
>> If you specify
>>
>> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK
>>
>> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK
>>
>> you should be able to make it ignore your packets.
>>
>>
>> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>>
>> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
>>
>> Med vänliga hälsningar
>> Josef Johansson
>>
>> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
>>> HI,
>>>
>>>
>>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>>>
>>> I see this with ping -s 1500
>>>
>>> On tap interface: 
>>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>>> length 1480
>>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>>>     37.16.72.52 > 77.244.240.131: ip-proto-1
>>>
>>> On vmbr0:
>>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>>> length 2008
>>>
>>> On bond0 its gone....
>>>
>>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>> Stoyan Marinov
>>> Gesendet: Mittwoch, 13. Oktober 2021 00:46
>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>
>>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>>>
>>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>>>
>>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
>>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>>>
>>> I'll try to investigate a bit further tomorrow.
>>>
>>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>>>
>>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>>>
>>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>
>>>>> HI,
>>>>>
>>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>>>> also intel)
>>>>>
>>>>> Also I see the I did a mistake:
>>>>>
>>>>> Setup working:
>>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> is correct. (before I had also tagged)
>>>>>
>>>>> it should be :
>>>>>
>>>>> Setup not working:
>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup working:
>>>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>>>
>>>>> Setup also working:
>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>>> Stoyan Marinov
>>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>>
>>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>>>
>>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>>
>>>>>> HI,
>>>>>>
>>>>>> i am playing around since days because we have strange packet losses.
>>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>>>
>>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>>>
>>>>>> Setup not working:
>>>>>>
>>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>>
>>>>>> Setup working:
>>>>>>
>>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>>
>>>>>> Setup also working:
>>>>>>
>>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>>
>>>>>> Have you got any idea where to search? I don't understand who is 
>>>>>> in charge of fragmenting packages again if they get reassembled by 
>>>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-devel mailing list
>>>>>> pve-devel@lists.proxmox.com
>>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>>
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pve-devel] BUG in vlan aware bridge
  2021-10-14  5:14                   ` Josef Johansson
@ 2021-10-14  5:40                     ` Josef Johansson
  0 siblings, 0 replies; 15+ messages in thread
From: Josef Johansson @ 2021-10-14  5:40 UTC (permalink / raw)
  To: VELARTIS Philipp Dürhammer, 'pve-devel@lists.proxmox.com'

This is one of the commits in includes/net/ip.h...

I'd say someone should look over this and fix it :)

commit 93fdd47e52f3f869a437319db9da1ea409acc07e
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sun Oct 5 12:00:22 2014 +0800

    bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
    
    As we may defragment the packet in IPv4 PRE_ROUTING and refragment
    it after POST_ROUTING we should save the value of frag_max_size.
    
    This is still very wrong as the bridge is supposed to leave the
    packets intact, meaning that the right thing to do is to use the
    original frag_list for fragmentation.
    
    Unfortunately we don't currently guarantee that the frag_list is
    left untouched throughout netfilter so until this changes this is
    the best we can do.
    
    There is also a spot in FORWARD where it appears that we can
    forward a packet without going through fragmentation, mark it
    so that we can fix it later.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index a615264cf01a..4063898cf8aa 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -404,6 +404,7 @@ static int br_nf_pre_routing_finish_bridge(struct
sk_buff *skb)
                              ETH_HLEN-ETH_ALEN);
             /* tell br_dev_xmit to continue with forwarding */
             nf_bridge->mask |= BRNF_BRIDGED_DNAT;
+            /* FIXME Need to refragment */
             ret = neigh->output(neigh, skb);
         }
         neigh_release(neigh);
@@ -459,6 +460,10 @@ static int br_nf_pre_routing_finish(struct sk_buff
*skb)
     struct nf_bridge_info *nf_bridge = skb->nf_bridge;
     struct rtable *rt;
     int err;
+    int frag_max_size;
+
+    frag_max_size = IPCB(skb)->frag_max_size;
+    BR_INPUT_SKB_CB(skb)->frag_max_size = frag_max_size;
 
     if (nf_bridge->mask & BRNF_PKT_TYPE) {
         skb->pkt_type = PACKET_OTHERHOST;
@@ -863,13 +868,19 @@ static unsigned int br_nf_forward_arp(const struct
nf_hook_ops *ops,
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
     int ret;
+    int frag_max_size;
 
+    /* This is wrong! We should preserve the original fragment
+     * boundaries by preserving frag_list rather than refragmenting.
+     */
     if (skb->protocol == htons(ETH_P_IP) &&
         skb->len + nf_bridge_mtu_reduction(skb) > skb->dev->mtu &&
         !skb_is_gso(skb)) {
+        frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
         if (br_parse_ip_options(skb))
             /* Drop invalid packet */
             return NF_DROP;
+        IPCB(skb)->frag_max_size = frag_max_size;
         ret = ip_fragment(skb, br_dev_queue_push_xmit);
     } else
         ret = br_dev_queue_push_xmit(skb);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index b6c04cbcfdc5..2398369c6dda 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -305,10 +305,14 @@ struct net_bridge
 
 struct br_input_skb_cb {
     struct net_device *brdev;
+
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
     int igmp;
     int mrouters_only;
 #endif
+
+    u16 frag_max_size;
+
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
     bool vlan_filtered;
 #endif

Med vänliga hälsningar
Josef Johansson

On 10/14/21 07:14, Josef Johansson wrote:
> Hi,
>
> I did some more digging searching for 'bridge-nf-call-iptables
> fragmentation'
>
> Found these forum posts:
>
> https://forum.proxmox.com/threads/net-bridge-bridge-nf-call-iptables-and-friends.64766/
>
> https://forum.proxmox.com/threads/linux-bridge-reassemble-fragmented-packets.96432/
>
> And this patch, which seems like they at least TRIED to get it fixed ;)
>
> https://lists.linuxfoundation.org/pipermail/bridge/2019-August/012185.html
>
> Med vänliga hälsningar
> Josef Johansson
>
> On 10/13/21 16:32, VELARTIS Philipp Dürhammer wrote:
>> If you Stop pve firewall service and echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables (you stop the netfilter hook)
>> Then it works for me also with taged tap devices and vlan aware bridge. I think it is a kernel bug.
>> What I don’t understand why not more people are reporting it...
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Josef Johansson <josef@oderland.se> 
>> Gesendet: Mittwoch, 13. Oktober 2021 16:19
>> An: VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at>; 'pve-devel@lists.proxmox.com' <pve-devel@lists.proxmox.com>
>> Betreff: Re: AW: [pve-devel] BUG in vlan aware bridge
>>
>> Hi,
>>
>> I can confirm that s > 12000 does not work on either
>>
>> size, tap(untagged, mtu 1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000), tap(tagged, mtu1500)->vlan-aware bridge(mtu 9000)->bond(mtu 9000)
>>
>> s > 12000, doesn't work, doesn't work
>>
>> s > 8000 , works, doesn't work
>>
>>
>> The traffic(one packet defragmented) is just dropped between bridge and tap. I tried my NOTRACK and it didn't have any affect.
>>
>>
>> We have either a bug in my mellanox cards here or the kernel. I don't think this is a normal case.
>>
>> Med vänliga hälsningar
>> Josef Johansson
>>
>> On 10/13/21 15:53, VELARTIS Philipp Dürhammer wrote:
>>> And what happens if you use packet size > 9000? this should still 
>>> work...(because it gets fragmented)
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>> Josef Johansson
>>> Gesendet: Mittwoch, 13. Oktober 2021 13:37
>>> An: pve-devel@lists.proxmox.com
>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>
>>> Hi,
>>>
>>> AFAIK it's netfilter that is doing defragmenting so that it can firewall.
>>>
>>> If you specify
>>>
>>> iptables -t raw -I PREROUTING -s 77.244.240.131 -j NOTRACK
>>>
>>> iptables -t raw -I PREROUTING -s 37.16.72.52 -j NOTRACK
>>>
>>> you should be able to make it ignore your packets.
>>>
>>>
>>> As a datapoint I could ping fine from a MTU 1500 host, over MTU 9000 vlan-aware bridges with firewalls to another MTU 1500.
>>>
>>> As you would assume the package is defragmented over MTU 9000 links and fragmented again over MTU 1500 devices.
>>>
>>> Med vänliga hälsningar
>>> Josef Johansson
>>>
>>> On 10/13/21 11:22, VELARTIS Philipp Dürhammer wrote:
>>>> HI,
>>>>
>>>>
>>>> Yes i think it has nothing to do with the bonds but with the vlan aware bridge interface.
>>>>
>>>> I see this with ping -s 1500
>>>>
>>>> On tap interface: 
>>>> 11:19:35.141414 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 64, id 39999, offset 0, flags [+], proto ICMP (1), length 1500)
>>>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>>>> length 1480
>>>> 11:19:35.141430 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype IPv4 (0x0800), length 562: (tos 0x0, ttl 64, id 39999, offset 1480, flags [none], proto ICMP (1), length 548)
>>>>     37.16.72.52 > 77.244.240.131: ip-proto-1
>>>>
>>>> On vmbr0:
>>>> 11:19:35.141442 62:47:e0:fe:f9:31 > 54:e0:32:27:6e:50, ethertype 802.1Q (0x8100), length 2046: vlan 350, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 39999, offset 0, flags [none], proto ICMP (1), length 2028)
>>>>     37.16.72.52 > 77.244.240.131: ICMP echo request, id 2182, seq 4, 
>>>> length 2008
>>>>
>>>> On bond0 its gone....
>>>>
>>>> But who is in charge of fragementing the packets normally? The bridge itself? Netfilter?
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>> Stoyan Marinov
>>>> Gesendet: Mittwoch, 13. Oktober 2021 00:46
>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>
>>>> OK, I have just verified it has nothing to do with bonds. I get the same behavior with vlan aware bridge, bridge-nf-call-iptables=1 with regular eth0 being part of the bridge. Packets arrive fragmented on tap, reassembled by netfilter and then re-injected in bridge assembled (full size).
>>>>
>>>> I did have limited success by setting net.bridge.bridge-nf-filter-vlan-tagged to 1. Now packets seem to get fragmented on the way out and back in, but there are still issues:
>>>>
>>>> 1. I'm testing with ping -s 2000 (1500 mtu everywhere) to an external box. I do see reply packets arrive on the vm nic, but ping doesn't see them. Haven't analyzed much further.
>>>> 2. While watching with tcpdump (inside the vm) i notice "ip reassembly time exceeded" messages being generated from the vm.
>>>>
>>>> I'll try to investigate a bit further tomorrow.
>>>>
>>>>> On 12 Oct 2021, at 11:26 PM, Stoyan Marinov <stoyan@marinov.us> wrote:
>>>>>
>>>>> That's an interesting observation. Now that I think about it, it could be caused by bonding and not the underlying device. When I tested this (about an year ago) I was using bonding on the mlx adapters and not using bonding on intel ones.
>>>>>
>>>>>> On 12 Oct 2021, at 3:36 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>>
>>>>>> HI,
>>>>>>
>>>>>> we use HP Server with Intel Cards or the standard hp nic ( ithink 
>>>>>> also intel)
>>>>>>
>>>>>> Also I see the I did a mistake:
>>>>>>
>>>>>> Setup working:
>>>>>> tapX (UNtagged) <- -> vmbr0 <- - > bond0
>>>>>>
>>>>>> is correct. (before I had also tagged)
>>>>>>
>>>>>> it should be :
>>>>>>
>>>>>> Setup not working:
>>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>>
>>>>>> Setup working:
>>>>>> tapX (untagged) <- -> vmbr0 <- - > bond0
>>>>>>
>>>>>> Setup also working:
>>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: pve-devel <pve-devel-bounces@lists.proxmox.com> Im Auftrag von 
>>>>>> Stoyan Marinov
>>>>>> Gesendet: Dienstag, 12. Oktober 2021 13:16
>>>>>> An: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
>>>>>> Betreff: Re: [pve-devel] BUG in vlan aware bridge
>>>>>>
>>>>>> I'm having the very same issue with Mellanox ethernet adapters. I don't see this behavior with Intel nics. What network cards do you have?
>>>>>>
>>>>>>> On 12 Oct 2021, at 1:48 PM, VELARTIS Philipp Dürhammer <p.duerhammer@velartis.at> wrote:
>>>>>>>
>>>>>>> HI,
>>>>>>>
>>>>>>> i am playing around since days because we have strange packet losses.
>>>>>>> Finally I can report following (Linux 5.11.22-4-pve, Proxmox 7, all devices MTU 1500):
>>>>>>>
>>>>>>> Packet with sizes > 1500 without VLAN working well but at the moment they are Tagged they are dropped by the bond device.
>>>>>>> Netfilter (set to 1) always reassembles the packets when they arrive a bridge. But they don't get fragmented again I they are VLAN tagged. So the bond device drops them. If the bridge is NOT Vlan aware they also get fragmented and it works well.
>>>>>>>
>>>>>>> Setup not working:
>>>>>>>
>>>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>>>
>>>>>>> Setup working:
>>>>>>>
>>>>>>> tapX (tagged) <- -> vmbr0 <- - > bond0
>>>>>>>
>>>>>>> Setup also working:
>>>>>>>
>>>>>>> tapX < - - > vmbr0v350 < -- > bond0.350 < -- > bond0
>>>>>>>
>>>>>>> Have you got any idea where to search? I don't understand who is 
>>>>>>> in charge of fragmenting packages again if they get reassembled by 
>>>>>>> netfilter. (and why it is not working with vlan aware bridges)
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pve-devel mailing list
>>>>>>> pve-devel@lists.proxmox.com
>>>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-devel mailing list
>>>>>> pve-devel@lists.proxmox.com
>>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>>> _______________________________________________
>>>>>> pve-devel mailing list
>>>>>> pve-devel@lists.proxmox.com
>>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>>> _______________________________________________
>>>>> pve-devel mailing list
>>>>> pve-devel@lists.proxmox.com
>>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>> _______________________________________________
>>>> pve-devel mailing list
>>>> pve-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-14  5:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 10:48 [pve-devel] BUG in vlan aware bridge VELARTIS Philipp Dürhammer
2021-10-12 11:16 ` Stoyan Marinov
2021-10-12 12:36   ` VELARTIS Philipp Dürhammer
2021-10-12 20:26     ` Stoyan Marinov
2021-10-12 22:45       ` Stoyan Marinov
2021-10-12 23:03         ` Stoyan Marinov
2021-10-13  9:22         ` VELARTIS Philipp Dürhammer
2021-10-13 11:36           ` Josef Johansson
2021-10-13 13:47             ` VELARTIS Philipp Dürhammer
2021-10-13 13:53               ` Josef Johansson
2021-10-13 13:53             ` VELARTIS Philipp Dürhammer
2021-10-13 14:19               ` Josef Johansson
2021-10-13 14:32                 ` VELARTIS Philipp Dürhammer
2021-10-14  5:14                   ` Josef Johansson
2021-10-14  5:40                     ` Josef Johansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal