all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix
@ 2025-09-11 10:05 Gabriel Goller
  2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Gabriel Goller @ 2025-09-11 10:05 UTC (permalink / raw)
  To: pve-devel

Stefan Hanreich discovered this nftables bug which breaks the atomicity when
updating certain sets. This means that when updating a set, packets sometimes
slip through even though the existing and the incoming rules deny the packet.
A full reproducer is available here: [0].
More information in following commit messages.

The upstream series has not been applied yet, but is available here:
https://lore.kernel.org/netfilter-devel/20250910080227.11174-1-fw@strlen.de/

Nftables changed quite a bit since 6.14 so the backport was a bit tricky -- a
few Tested-by's would be nice :). If anyone needs help to reproduce this or
wants a pre-build kernel with the fix feel free to reach out!

Thanks to Stefan Hanreich for identifying the bug and providing a minimal
reproducer, and to Florian Westphal for the quick fix.

[0]:
Initial network setup:

ip netns add east
ip netns add west

ip link add east type veth peer name west

ip link set east netns east
ip link set west netns west

ip netns exec east ip a a 192.0.2.20/24 dev east

ip netns exec west ip link add br0 type bridge
ip netns exec west ip a a 192.0.2.10/24 dev br0
ip netns exec west ip link set west master br0

ip netns exec east ip link set up east
ip netns exec west ip link set up west
ip netns exec west ip link set up br0


Initial nft ruleset in network namespace 'west':

table bridge west {
  set east-ip-nomatch {
    type ipv4_addr
    flags interval;
    elements = { 0.0.0.0-192.0.2.19, 192.0.2.21-255.255.255.255 }
  }

  chain block-spoofed {
    type filter hook prerouting priority filter; policy accept;
    ip saddr @east-ip-nomatch drop
  }
}


This should block all traffic on the bridge br0, which does not have
192.0.2.20 as source IP address, but when continuously flushing /
re-creating the east-ip-nomatch set via the following commands:

$ while true; do ip netns exec west nft -j -f update_set.json; done;

# update_set.json
{
  "nftables": [
    {
      "add": {
        "set": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch",
          "type": "ipv4_addr",
          "flags": [
            "interval"
          ]
        }
      }
    },
    {
      "flush": {
        "set": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch"
        }
      }
    },
    {
      "add": {
        "element": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch",
          "elem": [
            {
              "range": ["0.0.0.0", "192.0.2.19"]
            },
            {
              "range": ["192.0.2.21", "255.255.255.255"]
            }
          ]
        }
      }
    }
  ]
}


And then continously sending ICMP packets from east to west via e.g. scapy:

$ ip netns exec east python3 -c 'from scapy.all import send, Ether, IP,
ICMP; send(IP(src="192.0.2.30", dst="192.0.2.10")/ICMP(id=2222, seq=42),
count=1000000, inter=0.001)'



Some of them pass through, as is visible via tcpdump (sometimes its
required to terminate the process for the packets to be visible, since
the buffers do not get flushed immediately):

$ ip netns exec west tcpdump -envi br0 icmp

tcpdump: listening on br0, link-type EN10MB (Ethernet), snapshot length
262144 bytes
17:11:10.008758 06:a4:e8:d4:db:20 > 8a:88:57:79:f6:97, ethertype IPv4
(0x0800), length 42: (tos 0x0, ttl 64, id 1, offset 0, flags [none],
proto ICMP (1), l
ength 28)
    192.0.2.30 > 192.0.2.10: ICMP echo request, id 2222, seq 42, length 8

pve-kernel:

Gabriel Goller (5):
  kernel: backport: netfilter: nft_set_pipapo: don't check genbit from
    packetpath lookups
  kernel: backport: netfilter: nft_set_rbtree: continue traversal if
    element is inactive
  kernel: backport: netfilter: nf_tables: place base_seq in struct net
  kernel: backport: netfilter: nf_tables: make nft_set_do_lookup
    available unconditionally
  kernel: backport: netfilter: nf_tables: restart set lookup on base_seq
    change

 ...t_pipapo-don-t-check-genbit-from-pac.patch | 160 +++++++++
 ...t_rbtree-continue-traversal-if-eleme.patch |  88 +++++
 ..._tables-place-base_seq-in-struct-net.patch | 310 ++++++++++++++++++
 ...les-make-nft_set_do_lookup-available.patch |  86 +++++
 ...les-restart-set-lookup-on-base_seq-c.patch | 148 +++++++++
 5 files changed, 792 insertions(+)
 create mode 100644 patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch
 create mode 100644 patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch
 create mode 100644 patches/kernel/0016-netfilter-nf_tables-place-base_seq-in-struct-net.patch
 create mode 100644 patches/kernel/0017-netfilter-nf_tables-make-nft_set_do_lookup-available.patch
 create mode 100644 patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch


Summary over all repositories:
  5 files changed, 792 insertions(+), 0 deletions(-)

-- 
Generated by git-murpp 0.8.0


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-11 10:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-11 10:05 [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 2/5] kernel: backport: netfilter: nft_set_rbtree: continue traversal if element is inactive Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 3/5] kernel: backport: netfilter: nf_tables: place base_seq in struct net Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 4/5] kernel: backport: netfilter: nf_tables: make nft_set_do_lookup available unconditionally Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 5/5] kernel: backport: netfilter: nf_tables: restart set lookup on base_seq change Gabriel Goller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal