all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix
Date: Thu, 11 Sep 2025 12:05:41 +0200	[thread overview]
Message-ID: <20250911100555.63174-1-g.goller@proxmox.com> (raw)

Stefan Hanreich discovered this nftables bug which breaks the atomicity when
updating certain sets. This means that when updating a set, packets sometimes
slip through even though the existing and the incoming rules deny the packet.
A full reproducer is available here: [0].
More information in following commit messages.

The upstream series has not been applied yet, but is available here:
https://lore.kernel.org/netfilter-devel/20250910080227.11174-1-fw@strlen.de/

Nftables changed quite a bit since 6.14 so the backport was a bit tricky -- a
few Tested-by's would be nice :). If anyone needs help to reproduce this or
wants a pre-build kernel with the fix feel free to reach out!

Thanks to Stefan Hanreich for identifying the bug and providing a minimal
reproducer, and to Florian Westphal for the quick fix.

[0]:
Initial network setup:

ip netns add east
ip netns add west

ip link add east type veth peer name west

ip link set east netns east
ip link set west netns west

ip netns exec east ip a a 192.0.2.20/24 dev east

ip netns exec west ip link add br0 type bridge
ip netns exec west ip a a 192.0.2.10/24 dev br0
ip netns exec west ip link set west master br0

ip netns exec east ip link set up east
ip netns exec west ip link set up west
ip netns exec west ip link set up br0


Initial nft ruleset in network namespace 'west':

table bridge west {
  set east-ip-nomatch {
    type ipv4_addr
    flags interval;
    elements = { 0.0.0.0-192.0.2.19, 192.0.2.21-255.255.255.255 }
  }

  chain block-spoofed {
    type filter hook prerouting priority filter; policy accept;
    ip saddr @east-ip-nomatch drop
  }
}


This should block all traffic on the bridge br0, which does not have
192.0.2.20 as source IP address, but when continuously flushing /
re-creating the east-ip-nomatch set via the following commands:

$ while true; do ip netns exec west nft -j -f update_set.json; done;

# update_set.json
{
  "nftables": [
    {
      "add": {
        "set": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch",
          "type": "ipv4_addr",
          "flags": [
            "interval"
          ]
        }
      }
    },
    {
      "flush": {
        "set": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch"
        }
      }
    },
    {
      "add": {
        "element": {
          "family": "bridge",
          "table": "west",
          "name": "east-ip-nomatch",
          "elem": [
            {
              "range": ["0.0.0.0", "192.0.2.19"]
            },
            {
              "range": ["192.0.2.21", "255.255.255.255"]
            }
          ]
        }
      }
    }
  ]
}


And then continously sending ICMP packets from east to west via e.g. scapy:

$ ip netns exec east python3 -c 'from scapy.all import send, Ether, IP,
ICMP; send(IP(src="192.0.2.30", dst="192.0.2.10")/ICMP(id=2222, seq=42),
count=1000000, inter=0.001)'



Some of them pass through, as is visible via tcpdump (sometimes its
required to terminate the process for the packets to be visible, since
the buffers do not get flushed immediately):

$ ip netns exec west tcpdump -envi br0 icmp

tcpdump: listening on br0, link-type EN10MB (Ethernet), snapshot length
262144 bytes
17:11:10.008758 06:a4:e8:d4:db:20 > 8a:88:57:79:f6:97, ethertype IPv4
(0x0800), length 42: (tos 0x0, ttl 64, id 1, offset 0, flags [none],
proto ICMP (1), l
ength 28)
    192.0.2.30 > 192.0.2.10: ICMP echo request, id 2222, seq 42, length 8

pve-kernel:

Gabriel Goller (5):
  kernel: backport: netfilter: nft_set_pipapo: don't check genbit from
    packetpath lookups
  kernel: backport: netfilter: nft_set_rbtree: continue traversal if
    element is inactive
  kernel: backport: netfilter: nf_tables: place base_seq in struct net
  kernel: backport: netfilter: nf_tables: make nft_set_do_lookup
    available unconditionally
  kernel: backport: netfilter: nf_tables: restart set lookup on base_seq
    change

 ...t_pipapo-don-t-check-genbit-from-pac.patch | 160 +++++++++
 ...t_rbtree-continue-traversal-if-eleme.patch |  88 +++++
 ..._tables-place-base_seq-in-struct-net.patch | 310 ++++++++++++++++++
 ...les-make-nft_set_do_lookup-available.patch |  86 +++++
 ...les-restart-set-lookup-on-base_seq-c.patch | 148 +++++++++
 5 files changed, 792 insertions(+)
 create mode 100644 patches/kernel/0014-netfilter-nft_set_pipapo-don-t-check-genbit-from-pac.patch
 create mode 100644 patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch
 create mode 100644 patches/kernel/0016-netfilter-nf_tables-place-base_seq-in-struct-net.patch
 create mode 100644 patches/kernel/0017-netfilter-nf_tables-make-nft_set_do_lookup-available.patch
 create mode 100644 patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch


Summary over all repositories:
  5 files changed, 792 insertions(+), 0 deletions(-)

-- 
Generated by git-murpp 0.8.0


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


             reply	other threads:[~2025-09-11 10:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 10:05 Gabriel Goller [this message]
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 2/5] kernel: backport: netfilter: nft_set_rbtree: continue traversal if element is inactive Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 3/5] kernel: backport: netfilter: nf_tables: place base_seq in struct net Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 4/5] kernel: backport: netfilter: nf_tables: make nft_set_do_lookup available unconditionally Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 5/5] kernel: backport: netfilter: nf_tables: restart set lookup on base_seq change Gabriel Goller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250911100555.63174-1-g.goller@proxmox.com \
    --to=g.goller@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal