From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-kernel 2/5] kernel: backport: netfilter: nft_set_rbtree: continue traversal if element is inactive
Date: Thu, 11 Sep 2025 12:05:43 +0200 [thread overview]
Message-ID: <20250911100555.63174-3-g.goller@proxmox.com> (raw)
In-Reply-To: <20250911100555.63174-1-g.goller@proxmox.com>
If a match is found in a rbtree, set the interval at the very end to
avoid the element being inactive when finishing the traversal.
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
...t_rbtree-continue-traversal-if-eleme.patch | 88 +++++++++++++++++++
1 file changed, 88 insertions(+)
create mode 100644 patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch
diff --git a/patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch b/patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch
new file mode 100644
index 000000000000..9e4d4d687003
--- /dev/null
+++ b/patches/kernel/0015-netfilter-nft_set_rbtree-continue-traversal-if-eleme.patch
@@ -0,0 +1,88 @@
+From 2af0ed300431a3c5675cd6a7219424430fa9651b Mon Sep 17 00:00:00 2001
+From: Gabriel Goller <g.goller@proxmox.com>
+Date: Wed, 10 Sep 2025 12:08:56 +0200
+Subject: [PATCH 2/5] netfilter: nft_set_rbtree: continue traversal if element
+ is inactive
+
+When the rbtree lookup function finds a match in the rbtree, it sets the
+range start interval to a potentially inactive element.
+
+Then, after tree lookup, if the matching element is inactive, it returns
+NULL and suppresses a matching result.
+
+This is wrong and leads to false negative matches when a transaction has
+already entered the commit phase.
+
+cpu0 cpu1
+ has added new elements to clone
+ has marked elements as being
+ inactive in new generation
+ perform lookup in the set
+ enters commit phase:
+I) increments the genbit
+ A) observes new genbit
+ B) finds matching range
+ C) returns no match: found
+ range invalid in new generation
+II) removes old elements from the tree
+ C New nft_lookup happening now
+ will find matching element,
+ because it is no longer
+ obscured by old, inactive one.
+
+Consider a packet matching range r1-r2:
+
+cpu0 processes following transaction:
+1. remove r1-r2
+2. add r1-r3
+
+P is contained in both ranges. Therefore, cpu1 should always find a match
+for P. Due to above race, this is not the case:
+
+cpu1 does find r1-r2, but then ignores it due to the genbit indicating
+the range has been removed. It does NOT test for further matches.
+
+The situation persists for all lookups until after cpu0 hits II) after
+which r1-r3 range start node is tested for the first time.
+
+Move the "interval start is valid" check ahead so that tree traversal
+continues if the starting interval is not valid in this generation.
+
+Thanks to Stefan Hanreich for providing an initial reproducer for this
+bug.
+
+Reported-by: Stefan Hanreich <s.hanreich@proxmox.com>
+Fixes: c1eda3c6394f ("netfilter: nft_rbtree: ignore inactive matching element with no descendants")
+Signed-off-by: Florian Westphal <fw@strlen.de>
+Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
+---
+ net/netfilter/nft_set_rbtree.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
+index 2e8ef16ff191..c4eb94258e24 100644
+--- a/net/netfilter/nft_set_rbtree.c
++++ b/net/netfilter/nft_set_rbtree.c
+@@ -77,7 +77,9 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
+ nft_rbtree_interval_end(rbe) &&
+ nft_rbtree_interval_start(interval))
+ continue;
+- interval = rbe;
++ if (nft_set_elem_active(&rbe->ext, genmask) &&
++ !nft_rbtree_elem_expired(rbe))
++ interval = rbe;
+ } else if (d > 0)
+ parent = rcu_dereference_raw(parent->rb_right);
+ else {
+@@ -103,8 +105,6 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
+ }
+
+ if (set->flags & NFT_SET_INTERVAL && interval != NULL &&
+- nft_set_elem_active(&interval->ext, genmask) &&
+- !nft_rbtree_elem_expired(interval) &&
+ nft_rbtree_interval_start(interval)) {
+ *ext = &interval->ext;
+ return true;
+--
+2.47.3
+
--
2.47.3
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-09-11 10:06 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-11 10:05 [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
2025-09-11 10:05 ` Gabriel Goller [this message]
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 3/5] kernel: backport: netfilter: nf_tables: place base_seq in struct net Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 4/5] kernel: backport: netfilter: nf_tables: make nft_set_do_lookup available unconditionally Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 5/5] kernel: backport: netfilter: nf_tables: restart set lookup on base_seq change Gabriel Goller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250911100555.63174-3-g.goller@proxmox.com \
--to=g.goller@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox