public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Gabriel Goller <g.goller@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH pve-kernel 5/5] kernel: backport: netfilter: nf_tables: restart set lookup on base_seq change
Date: Thu, 11 Sep 2025 12:05:46 +0200	[thread overview]
Message-ID: <20250911100555.63174-6-g.goller@proxmox.com> (raw)
In-Reply-To: <20250911100555.63174-1-g.goller@proxmox.com>

If a set lookup fails, retry again if the base_seq has changed. This
means that if a lookup does not return a value, check if there was a
generation change in the meantime. If there was, redo the lookup. Exact
explanation is inside the patch.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 ...les-restart-set-lookup-on-base_seq-c.patch | 148 ++++++++++++++++++
 1 file changed, 148 insertions(+)
 create mode 100644 patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch

diff --git a/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch b/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch
new file mode 100644
index 000000000000..26b28ffc0aa1
--- /dev/null
+++ b/patches/kernel/0018-netfilter-nf_tables-restart-set-lookup-on-base_seq-c.patch
@@ -0,0 +1,148 @@
+From 35dd20f6e27b4a6b0b6c2cbf6ebf8eab33121378 Mon Sep 17 00:00:00 2001
+From: Gabriel Goller <g.goller@proxmox.com>
+Date: Wed, 10 Sep 2025 12:18:18 +0200
+Subject: [PATCH 5/5] netfilter: nf_tables: restart set lookup on base_seq
+ change
+
+The hash, hash_fast, rhash and bitwise sets may indicate no result even
+though a matching element exists during a short time window while other
+cpu is finalizing the transaction.
+
+This happens when the hash lookup/bitwise lookup function has picked up
+the old genbit, right before it was toggled by nf_tables_commit(), but
+then the same cpu managed to unlink the matching old element from the
+hash table:
+
+cpu0					cpu1
+  has added new elements to clone
+  has marked elements as being
+  inactive in new generation
+					perform lookup in the set
+  enters commit phase:
+					A) observes old genbit
+   increments base_seq
+I) increments the genbit
+II) removes old element from the set
+					B) finds matching element
+					C) returns no match: found
+					element is not valid in old
+					generation
+
+					Next lookup observes new genbit and
+					finds matching e2.
+
+Consider a packet matching element e1, e2.
+
+cpu0 processes following transaction:
+1. remove e1
+2. adds e2, which has same key as e1.
+
+P matches both e1 and e2.  Therefore, cpu1 should always find a match
+for P. Due to above race, this is not the case:
+
+cpu1 observed the old genbit.  e2 will not be considered once it is found.
+The element e1 is not found anymore if cpu0 managed to unlink it from the
+hlist before cpu1 found it during list traversal.
+
+The situation only occurs for a brief time period, lookups happening
+after I) observe new genbit and return e2.
+
+This problem exists in all set types except nft_set_pipapo, so fix it once
+in nft_lookup rather than each set ops individually.
+
+Sample the base sequence counter, which gets incremented right before the
+genbit is changed.
+
+Then, if no match is found, retry the lookup if the base sequence was
+altered in between.
+
+If the base sequence hasn't changed:
+ - No update took place: no-match result is expected.
+   This is the common case.  or:
+ - nf_tables_commit() hasn't progressed to genbit update yet.
+   Old elements were still visible and nomatch result is expected, or:
+ - nf_tables_commit updated the genbit:
+   We picked up the new base_seq, so the lookup function also picked
+   up the new genbit, no-match result is expected.
+
+If the old genbit was observed, then nft_lookup also picked up the old
+base_seq: nft_lookup_should_retry() returns true and relookup is performed
+in the new generation.
+
+Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
+patchset, for suggesting re-use of existing base_seq and placement of
+the restart loop in nft_set_do_lookup().
+
+Signed-off-by: Florian Westphal <fw@strlen.de>
+[GG: this patch needed a bigger change because __nft_set_do_lookup
+doesn't ext pointer yet, but still returns a bool, so we just check if
+the bool is false, so if no element has been found and retry.]
+Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
+---
+ net/netfilter/nf_tables_api.c |  3 ++-
+ net/netfilter/nft_lookup.c    | 32 +++++++++++++++++++++++++++++++-
+ 2 files changed, 33 insertions(+), 2 deletions(-)
+
+diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
+index f9e7f056ea5b..40a32ef86470 100644
+--- a/net/netfilter/nf_tables_api.c
++++ b/net/netfilter/nf_tables_api.c
+@@ -10733,7 +10733,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
+ 	while (++base_seq == 0)
+ 		;
+ 
+-	WRITE_ONCE(net->nft.base_seq, base_seq);
++	/* pairs with smp_load_acquire in nft_lookup_eval */
++	smp_store_release(&net->nft.base_seq, base_seq);
+ 
+ 	gc_seq = nft_gc_seq_begin(nft_net);
+ 
+diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
+index 7d0add1041bb..fc0eb91d94c0 100644
+--- a/net/netfilter/nft_lookup.c
++++ b/net/netfilter/nft_lookup.c
+@@ -53,10 +53,40 @@ static bool __nft_set_do_lookup(const struct net *net, const struct nft_set *set
+ #endif
+ 	return set->ops->lookup(net, set, key, ext);
+ }
++
++static unsigned int nft_base_seq(const struct net *net)
++{
++	/* pairs with smp_store_release() in nf_tables_commit() */
++	return smp_load_acquire(&net->nft.base_seq);
++}
++
++static bool nft_lookup_should_retry(const struct net *net, unsigned int seq)
++{
++	return unlikely(seq != nft_base_seq(net));
++}
++
+ bool nft_set_do_lookup(const struct net *net, const struct nft_set *set,
+ 		       const u32 *key, const struct nft_set_ext **ext)
+ {
+-	return __nft_set_do_lookup(net, set, key, ext);
++	bool result;
++	unsigned int base_seq;
++
++	do {
++		base_seq = nft_base_seq(net);
++
++		result = __nft_set_do_lookup(net, set, key, ext);
++		if (result)
++			break;
++		/* No match?  There is a small chance that lookup was
++		 * performed in the old generation, but nf_tables_commit()
++		 * already unlinked a (matching) element.
++		 *
++		 * We need to repeat the lookup to make sure that we didn't
++		 * miss a matching element in the new generation.
++		 */
++	} while (nft_lookup_should_retry(net, base_seq));
++
++	return result;
+ }
+ EXPORT_SYMBOL_GPL(nft_set_do_lookup);
+ 
+-- 
+2.47.3
+
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      parent reply	other threads:[~2025-09-11 10:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 10:05 [pve-devel] [PATCH kernel 0/5] backport nftables atomicity fix Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 1/5] kernel: backport: netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 2/5] kernel: backport: netfilter: nft_set_rbtree: continue traversal if element is inactive Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 3/5] kernel: backport: netfilter: nf_tables: place base_seq in struct net Gabriel Goller
2025-09-11 10:05 ` [pve-devel] [PATCH pve-kernel 4/5] kernel: backport: netfilter: nf_tables: make nft_set_do_lookup available unconditionally Gabriel Goller
2025-09-11 10:05 ` Gabriel Goller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250911100555.63174-6-g.goller@proxmox.com \
    --to=g.goller@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal